Loan Approval Prediction - Kaggle Playground Series 2024

AUC-ROC Score: 0.96037
Competition Link | GitHub Code

🎯 Problem Overview

Objective: Predict loan approval probability (loan_status) using financial and demographic features.
Challenge: Synthetic dataset mimicking real-world loan approval patterns while protecting test labels.
Evaluation: AUC-ROC (Area Under the ROC Curve), ideal for imbalanced binary classification.

🛠️ Solution Architecture

Why This Approach Works

Feature Engineering
- Created loan_amnt_to_income: Debt-to-income ratio
  Why? Directly measures repayment capacity.
- Added emp_length_credit_ratio: Employment vs credit history relationship
  Why? Captures stability of financial profile.
- Alternatives Considered:
  - Income bucketing (rejected: loses granularity)
  - Loan term features (not available in data)
Preprocessing
- Numerical Features:
  - Median imputation (robust to outliers)
  - Standard scaling (essential for gradient-based models)
- Categorical Features:
  - Mode imputation (preserves category distribution)
  - OneHot Encoding (avoid ordinal assumptions)
- Alternatives Rejected:
  - Target encoding (risk of overfitting)
  - KNN imputation (computationally expensive)
Model Choice - XGBoost
- Handles mixed data types effectively
- Native support for missing values
- Robust to moderate overfitting
- Why Not Alternatives:
  - Logistic Regression: Poor with non-linear relationships
  - Random Forest: Less tunable than XGBoost
  - Neural Networks: Overkill for tabular data
Hyperparameter Tuning
- RandomizedSearchCV: Sampled 10% of parameter space
  Why? 5x faster than GridSearch with comparable results
- StratifiedKFold: Maintains class balance in splits
- Key Parameters Tuned:
  - learning_rate: Balances speed vs accuracy
  - colsample_bytree: Controls feature randomness

📊 Key Insights

Class Imbalance:
- Original dataset had ~30% rejection rate
- Addressed via stratified sampling, not SMOTE (preserved natural distribution)
Feature Importance:
- Top predictors:
  1. loan_percent_income
  2. loan_int_rate
  3. person_income
- Engineered features ranked in top 10
Threshold Optimization:
- Default 0.5 threshold maintained
- Why? ROC analysis showed balanced TPR/FPR at this level

🚀 How to Reproduce

Environment Setup:

pip install pandas scikit-learn xgboost matplotlib seaborn

Data Preparation:
- Download train.csv and test.csv from Kaggle
- Place in project root directory

Run Model:

jupyter notebook loan_approval_prediction.ipynb

Expected Outputs:
- submission.csv: Final predictions
- Feature importance plots in notebook

❓ Why Not Alternatives?

Alternative Approach	Reason for Exclusion
CatBoost	Minimal AUC gain (<0.002) in validation
Stacking Models	Complexity vs ROI analysis unfavorable
Feature Selection	XGBoost's inherent selection sufficient
Deep Learning	Limited data (~10k rows)
Cost-Sensitive Learning	Class imbalance not severe enough

📈 Results Interpretation

0.96037 AUC: Top of competition entries
Critical Success Factors:
1. Debt-to-income ratio engineering
2. Careful handling of missing values
3. Learning rate annealing (0.01 → 0.1)

📜 License

MIT License - Code free for academic/commercial use with attribution

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
database		database
JupiterNotebook.ipynb		JupiterNotebook.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loan Approval Prediction - Kaggle Playground Series 2024

🎯 Problem Overview

🛠️ Solution Architecture

Why This Approach Works

📊 Key Insights

🚀 How to Reproduce

❓ Why Not Alternatives?

📈 Results Interpretation

📜 License

About

Uh oh!

Languages

License

vaisu-bhut/Loan-Prediction

Folders and files

Latest commit

History

Repository files navigation

Loan Approval Prediction - Kaggle Playground Series 2024

🎯 Problem Overview

🛠️ Solution Architecture

Why This Approach Works

📊 Key Insights

🚀 How to Reproduce

❓ Why Not Alternatives?

📈 Results Interpretation

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages