This project applies machine learning to assess the risk of granting credit to potential borrowers. The goal is to predict whether a loan applicant is likely to default, helping lenders make data-driven decisions.
Credit risk modeling is crucial for financial institutions to minimize losses. This project uses historical customer data to predict loan status (safe or risky), leveraging classification techniques.
The dataset includes key features such as:
- Age, Job, Housing, Loan status
- Credit amount, Duration, Purpose
- Savings, Checking account balance
- Personal status, Employment, and more
Source: Public dataset (e.g., UCI German Credit — replace if different)
- EDA & Visualization (missing value treatment, correlations)
- Encoding & Scaling (Label Encoding, StandardScaler)
- Feature Engineering (selecting top features)
- Modeling: Logistic Regression, Random Forest, Decision Tree
- Evaluation: Accuracy, Precision, Recall, F1-Score, Confusion Matrix
- ROC Curve & AUC Score
- Achieved over 85% accuracy with Random Forest Classifier.
- Visualized results using Seaborn and Matplotlib.
- Feature importance and confusion matrix help interpret performance.
- End-to-end pipeline of a real ML classification problem.
- Importance of balancing data and choosing the right metric.
- Clear understanding of model performance and tradeoffs.