🔍 Data-Driven Risk Profiling | 💡 Predictive Modeling | 📈 Visual Storytelling
This project explores the key factors contributing to loan defaults by applying the full data analysis pipeline — from data cleaning and EDA to feature engineering and machine learning modeling. The objective is to provide actionable insights for financial institutions to identify high-risk borrowers and optimize lending strategies.
- Clean and preprocess loan application data
- Identify patterns and relationships using grouped visualizations
- Apply feature engineering techniques for model readiness
- Train and evaluate Logistic Regression and Random Forest models
- Predict loan status (Default / Non-Default) accurately
✅ Cleaned missing values, handled outliers, and transformed skewed distributions
📊 Created grouped bar charts for Age, Income, Loan Amount, etc., vs Loan Status
🧹 Applied binning, label encoding, and log transformation techniques
🧠 Built two predictive models:
- Logistic Regression
- Random Forest Classifier
🧪 Evaluated model performance using accuracy, confusion matrix, and precision-recall
🧾 Delivered insights in a clear, visual, and business-focused manner
- Young and low-income applicants show higher default risk
- Loan purpose like education and medical had elevated default rates
- Home ownership status significantly impacted risk levels
- Higher loan amounts → Higher chances of default
- Random Forest outperformed Logistic Regression in predicting default cases
| Category | Tools / Libraries |
|---|---|
| 🐍 Programming | Python |
| 📊 Data Analysis | Pandas, NumPy |
| 📈 Visualization | Matplotlib, Seaborn |
| 🤖 Machine Learning | Scikit-learn (LogisticRegression, RandomForest) |
| 🧪 Environment | Jupyter Notebook |
- Cleaned and preprocessed dataset
- Grouped bar chart visualizations
- Trained ML models (Logistic & Random Forest)
- Insightful visual storytelling
- Jupyter Notebook with complete analysis pipeline
I’m Syed Hur Abbas Naqvi, a Certified Data Analyst skilled in Python, SQL, Microsoft Power BI, Excel, and Machine Learning.
I specialize in turning raw data into business intelligence that drives growth — from data cleaning & EDA to visualization & strategic insights.
🌐 Portfolio: https://hurabbas05.github.io/
🔗 LinkedIn: https://www.linkedin.com/in/hurabbas05/
📧 Email: syedhur572@gmail.com
📞 Phone: +923036098700
If you found this project helpful, feel free to ⭐ star this repository to support and bookmark it!