Skip to content

This project fine-tunes an RF model to predict job rejections and employs evaluation metrics to assess accuracy.

Notifications You must be signed in to change notification settings

rnx2024/RF-Model-for-Job-Rejection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Job Rejections using Random Forest Model

This project fine-tune an RF model to predict rejections using a dataset of more than 25K jobs. The data is preprocessed and fed into A Random Forest model to predict the probability of rejection based on various features.

This is to make clear that in training the model no applicant data is used. Without considering applicant-specific details like qualifications, skills, or experience, predictions would reduce the process to broad generalizations that don't account for individual merit. However, this project is useful in the context of how machine learning model learns from data. That while the model may not have used applicant data, it can learn from data and make predictions with high accuracy.

Main Features

1. Data Preprocessing:

  • Converts date columns to datetime objects and extracts the month.

  • Applies target encoding to categorical features.

  • Handles missing values and ensures all features are numeric.

2. Implemented regularization for higher accuracy

# Define and configure the RandomForestClassifier with regularization
rf_model = RandomForestCLassifier(
    n_estimators=100,  # Keep the number of trees to 100
    max_depth=10,  # Limit the depth of each tree to prevent overfitting
    min_samples_split=5,  # Require at least 5 samples to split a node
    min_samples_leaf=2,  # Ensure at least 2 samples exist in each leaf node
    max_features='sqrt',  # Limit the number of features considered at each split
    random_state=42
    oob_score=True,
)
  • By using max-depth of 10, the decisions trees don't grow too complex and memorize the training data to avoid overfitting.
  • By requiring at least 5 samples to split the node, this avoids splits that result in overly small leaf nodes, improving generalization.
  • By using min_samples_split of 2, this ensures that the tree doesn't get too specific to the training data.
  • By using max_features= 'sqrt', it introduces randomness and helps reduce overfitting by ensuring trees don’t rely too heavily on any particular feature

3. Evaluation Metrics

  • OOB Score: 0.9635058515086417 (This evaluates the RF model's ability to predict unseen data during training by using the Out-of-Bag samples)

  • Confusion Matrix: [[3095 39] # True Negatives (39 predicted rejections but actual non-rejections) [ 11 1911]] # False Positives (11 predicted non-rejections but actual rejections)

  • Recall: 0.9942767950052029 (This is the model's ability to identify actual rejections)

  • Precision: 0.98 (How well it avoids false positives)

  • F1-score: 0.9870867768595041 (This is to check the balance between the false negatives and false positives)

  • ROC-AUC score: 0.9994095672517259 (Guage the model's efficiency in differentiating rejected and not rejected)

With the evalution results, the model shows that it has reliable predictions with high accuracy and minimal errors.

4. Plotted Feature Importance

Image1

Usage Instructions

  1. Clone the repository

git clone https://github.com/rnx2024/RF-Model-for-Job-Rejection

  1. Install dependencies using requirements.txt

pip install -r requirements.txt

  1. Run the Python file

About

This project fine-tunes an RF model to predict job rejections and employs evaluation metrics to assess accuracy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages