Loan Default Prediction

This project demonstrates a machine learning pipeline for predicting the grade of a loan (which can be an indicator of loan default risk) using various classification algorithms. It includes:

Data preprocessing and feature engineering
Model training using different classifiers (Random Forest, Gradient Boosting, Logistic Regression, SVC, XGBoost, KNN)
Hyperparameter tuning for Random Forest and XGBoost
Evaluation using performance metrics such as accuracy score and classification report
Visualization of the confusion matrix for the best model

Data

Source: The code expects a CSV file named train.csv
Description: The dataset contains information about loans (e.g., annual income, debt-to-income ratio, interest rates, loan amount, home ownership, etc.) as well as a target label (grade) indicating the credit grade of the loan.

Model Training and Evaluation

In this phase, we prepare the data, transform it, train various classifiers, tune hyperparameters, and evaluate performance metrics. Below is a concise overview of the steps—combined into a single section—that you can copy into your notebook.

Data Preprocessing:

Drop redundant or unnecessary columns (e.g., application_type, emp_title, issue_date_year, issue_date_hour).
Convert string columns with numerical values to floats (e.g., emp_length, term).
Convert date columns to datetime objects and extract relevant features (e.g., month, day, weekday).
Handle missing values using SimpleImputer (mean or median) or KNNImputer.
Scale numerical features with RobustScaler.

Encoding and Transformation:

Transform categorical features using OneHotEncoder (with drop='first' to avoid dummy variable trap).
Label-encode the target variable (grade) using LabelEncoder.

Training Classifiers:
We train multiple models on the processed dataset:

Random Forest
Gradient Boosting
Support Vector Classifier (SVC)
Logistic Regression
XGBoost
K-Nearest Neighbors (KNN)

Each classifier is fit on the training set, and predictions are made on the test set.

Hyperparameter Tuning:

Random Forest: Use RandomizedSearchCV with a grid including parameters such as n_estimators, max_features, max_depth, max_samples.
XGBoost: Use RandomizedSearchCV with parameters such as learning_rate, max_depth, min_child_weight, gamma, colsample_bytree.

Evaluation:

Accuracy Score: accuracy_score(y_test, y_pred)
Classification Report: classification_report(y_test, y_pred) for precision, recall, and F1-scores
Confusion Matrix: Visualize with seaborn.heatmap to understand misclassifications

By comparing accuracy and other metrics (precision, recall, F1-score) across these models, we identify the best-performing classifier. The confusion matrix of the tuned best model (often XGBoost or Random Forest) provides deeper insights into predictions versus actual outcomes.

Results

After training and tuning each of the models (Random Forest, Gradient Boosting, Logistic Regression, SVC, XGBoost, and KNN), you can compare their performance using various metrics.

Accuracy Scores:
- For each model, compute accuracy_score(y_test, y_pred) to get the percentage of correct predictions.
Classification Reports:
- Use classification_report(y_test, y_pred) to evaluate precision, recall, and F1-score for each class.
Confusion Matrix:
- Visualize the confusion matrix with seaborn.heatmap to understand the distribution of predictions vs. actual classes. This helps identify which classes are often misclassified.

Below is an example of the output you might see for a tuned XGBoost model:

Best XGB Accuracy: 0.85

...

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
Various model testing.ipynb		Various model testing.ipynb
notebookess-final (2).ipynb		notebookess-final (2).ipynb
test.csv		test.csv
train.csv		train.csv
xgboost-try (1).ipynb		xgboost-try (1).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loan Default Prediction

Table of Contents

Data

Model Training and Evaluation

Results

About

Uh oh!

Releases

Packages

Languages

rahulghimire77/Bank-Loan-Default-Prediction

Folders and files

Latest commit

History

Repository files navigation

Loan Default Prediction

Table of Contents

Data

Model Training and Evaluation

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages