Skip to content

hurabbas05/Loan-Default-Risk-Analysis-Project

Repository files navigation

Loan Default Risk Analysis Project

🔍 Data-Driven Risk Profiling | 💡 Predictive Modeling | 📈 Visual Storytelling


🚀 Overview

This project explores the key factors contributing to loan defaults by applying the full data analysis pipeline — from data cleaning and EDA to feature engineering and machine learning modeling. The objective is to provide actionable insights for financial institutions to identify high-risk borrowers and optimize lending strategies.


🎯 Objectives

  • Clean and preprocess loan application data
  • Identify patterns and relationships using grouped visualizations
  • Apply feature engineering techniques for model readiness
  • Train and evaluate Logistic Regression and Random Forest models
  • Predict loan status (Default / Non-Default) accurately

📌 Project Highlights

✅ Cleaned missing values, handled outliers, and transformed skewed distributions
📊 Created grouped bar charts for Age, Income, Loan Amount, etc., vs Loan Status
🧹 Applied binning, label encoding, and log transformation techniques
🧠 Built two predictive models:

  • Logistic Regression
  • Random Forest Classifier
    🧪 Evaluated model performance using accuracy, confusion matrix, and precision-recall
    🧾 Delivered insights in a clear, visual, and business-focused manner

📊 Key Insights

  • Young and low-income applicants show higher default risk
  • Loan purpose like education and medical had elevated default rates
  • Home ownership status significantly impacted risk levels
  • Higher loan amounts → Higher chances of default
  • Random Forest outperformed Logistic Regression in predicting default cases

🛠️ Tools & Technologies

Category Tools / Libraries
🐍 Programming Python
📊 Data Analysis Pandas, NumPy
📈 Visualization Matplotlib, Seaborn
🤖 Machine Learning Scikit-learn (LogisticRegression, RandomForest)
🧪 Environment Jupyter Notebook

📁 Deliverables

  • Cleaned and preprocessed dataset
  • Grouped bar chart visualizations
  • Trained ML models (Logistic & Random Forest)
  • Insightful visual storytelling
  • Jupyter Notebook with complete analysis pipeline

🙋‍♂️ Author

I’m Syed Hur Abbas Naqvi, a Certified Data Analyst skilled in Python, SQL, Microsoft Power BI, Excel, and Machine Learning.
I specialize in turning raw data into business intelligence that drives growth — from data cleaning & EDA to visualization & strategic insights.

🌐 Portfolio: https://hurabbas05.github.io/
🔗 LinkedIn: https://www.linkedin.com/in/hurabbas05/
📧 Email: syedhur572@gmail.com
📞 Phone: +923036098700


🌟 Star This Repo

If you found this project helpful, feel free to ⭐ star this repository to support and bookmark it!