Predicting loan approvals with probabilistic modeling to support financial risk mitigation and smarter lending decisions
Financial institutions need accurate tools to determine the creditworthiness of applicants. The goal of this project was to build a binary classification model to predict whether a loan applicant would be approved or not. This challenge was hosted on Kaggle as part of Playground Series - Season 4, Episode 10. Participants were required to submit predicted probabilities for loan approvals, and the evaluation metric used was Area Under the ROC Curve (AUC-ROC). Higher AUC scores indicated better model performance.

You can explore the complete methodology in my notebook:
🔗 PS4E10 - Solution 2
Key steps followed:
-
📊 Data Augmentation & Preprocessing:
- Merged the official competition training data with the original open-source credit risk dataset to increase training volume.
- Performed proper alignment and null handling to ensure feature consistency.
-
🧠 Model Stacking with OOFs & Ensembling:
- Leveraged pre-computed out-of-fold (OOF) predictions and test set predictions from diverse base models including
lgbm
,catboost
,tabnet
, and others. - Combined OOFs for robust out-of-sample validation.
- Leveraged pre-computed out-of-fold (OOF) predictions and test set predictions from diverse base models including
-
🧮 Hill Climbing Ensembling:
- Applied a hill climbing algorithm to greedily find the optimal blend of base models by maximizing the AUC on validation data.
- Iteratively selected models that improved validation AUC until no further gain was observed.
-
🧪 Final Prediction & Submission:
- Used the optimal ensemble weights derived via hill climbing to combine model predictions on the test set.
- Generated submission probabilities accordingly.
-
✅ Public Leaderboard Scores:
- Achieved scores of 0.97324, 0.97347, 0.97369, and 0.97285.
-
🏁 Private Leaderboard Scores:
- Final scores: 0.96855, 0.96886 (best), 0.96877, and 0.96712 for subsequent submissions.
-
🥇 Rank Achieved:
- Ranked 218 / 4080 participants and 3858 teams, as a solo participant.
- 📂 Kaggle Competition: Loan Approval Prediction
- 📁 Dataset: Competition Data
- 📁 Source Dataset: Open Credit Risk Dataset
- Language: Python 🐍
- Libraries:
pandas
,numpy
for data handlingmatplotlib
,seaborn
for basic EDAlightgbm
,catboost
,tabnet
for modelingsklearn
for evaluation and metrics
- Tools:
- Jupyter Notebook 📓 for modeling
- Kaggle Kernels / Colab for experimentation
📌 This project demonstrates the power of ensembling and model optimization in achieving high-performance predictive modeling under AUC evaluation criteria.