Credit-Risk-Prediction-Using-Supervised-Machine-Learning-Model

A machine learning project for predicting credit card default risk using supervised machine learning techniques such as Logistic Regression, Random Forest, and XGBoost. Applied SMOTE and hyperparameter tuning for better performance. Developed for the ECON 611 term project at the University of Calgary.

Objective

The goal of this project is to develop a predictive model that accurately classifies credit card clients into defaulters and non-defaulters. This will help lenders, such as banks and financial institutions, assess the likelihood that a specific borrower will default based on their personal and financial characteristics.

Dataset

Source: UCI Default of Credit Card Clients
Size: 30,000 records
Features: 23 original attributes including demographics, billing, repayment status, and credit limit
Target Variable: Default payment next month (1 =Client defaulted on payment, 0 = Client did not default)

Methods & Techniques

Preprocessing

One-hot encoding for categorical variables
StandardScaler for numeric features
Train-test split (80/20)

Feature Engineering

Derived features such as:

Average credit utilization
Number of late payments
Recent delay risk
Repayment-to-limit ratios

Models Implemented

Logistic Regression (with L1 & L2)
Decision Tree
Random Forest
XGBoost
K-Nearest Neighbours (KNN)
Neural Network

Class Imbalance Handling

Used SMOTE to address class imbalance
Applied only to the training data

Hyperparameter Tuning

Used GridSearchCV with 5-fold cross-validation
Optimized Random Forest and Logistic Regression based on recall

Evaluation Metrics

Accuracy
Recall (1)
F1-Score
AUC (1)
Confusion Matrix

Results Summary

Model	Accuracy	Recall(1)	F1 Score(1)	AUC
Logistic Regression + SMOTE	0.7360	0.62	0.51	74.0%
Random Forest + SMOTE	0.6889	0.50	0.41	62%
XGBoost + SMOTE	0.7797	0.48	0.49	74%
KNN + SMOTE	0.8051	0.36	0.45	71%
Neural Network + SMOTE	0.7814	0.50	0.50	78.14%

Feature Importance (Top 3)

LONGEST_DELAY: Max payment delay
PAY_0: Most recent repayment status
NUM_DELAYS: Total number of delayed months

Key Takeaways

SMOTE significantly improved recall without sacrificing too much AUC
Random Forest + SMOTE gave the best balance of precision and sensitivity
Logistic Regression + SMOTE had the highest recall, crucial for risk detection

Business Implications

This pipeline provides a scalable and interpretable tool for financial institutions to:

Minimize false negatives in risk assessment
Improve loan portfolio management
Deploy model in loan application systems & instantly flag high-risk clients for review.

Author

Bozhao Wang
Department of Economics
University of Calgary
Instructor: Dr. Arvind Magesan

License

MIT License

Note: This project was submitted as a term report for ECON 611: Machine Learning in Economics. For academic use only.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Econ 611 Term Report_ Credit Risk Prediction Using Supervised Machine Learning.pdf		Econ 611 Term Report_ Credit Risk Prediction Using Supervised Machine Learning.pdf
LICENSE		LICENSE
README.md		README.md
default of credit card clients.csv		default of credit card clients.csv
final project code.ipynb		final project code.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Credit-Risk-Prediction-Using-Supervised-Machine-Learning-Model

Contents

Objective

Dataset

Methods & Techniques

Preprocessing

Feature Engineering

Models Implemented

Class Imbalance Handling

Hyperparameter Tuning

Evaluation Metrics

Results Summary

Feature Importance (Top 3)

Key Takeaways

Business Implications

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

akabzw24/Credit-Risk-Prediction-Using-Supervised-Machine-Learning-Model

Folders and files

Latest commit

History

Repository files navigation

Credit-Risk-Prediction-Using-Supervised-Machine-Learning-Model

Contents

Objective

Dataset

Methods & Techniques

Preprocessing

Feature Engineering

Models Implemented

Class Imbalance Handling

Hyperparameter Tuning

Evaluation Metrics

Results Summary

Feature Importance (Top 3)

Key Takeaways

Business Implications

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages