This project is focused on building a classification model that predicts whether a loan applicant will be approved or not based on key demographic and financial attributes. The goal is to assist financial institutions in automating the screening process and improving decision accuracy.
- Source: Kaggle - Loan Prediction Dataset
- Files Used:
train.csv
— 614 rows of labeled training datatest.csv
— unlabeled data for predictionloan_predictions.csv
— model-generated predictions
- Python (Pandas, NumPy, Matplotlib, Seaborn)
- Scikit-learn (Logistic Regression, Random Forest)
- Jupyter Notebook
- Git, GitHub
- Data Cleaning & Preprocessing
- Exploratory Data Analysis (EDA)
- Feature Encoding
- Model Training (Logistic Regression, Random Forest)
- Model Evaluation (Accuracy, Precision, Recall, F1 Score)
- Prediction on Test Data
- Business Insights Summary
- Best Model: Logistic Regression
- Accuracy: ~78.9%
- Key Influencing Features: Credit History, Applicant Income, Education
- Business Value: Streamlined loan approval process with minimal manual effort
- Clone this repo or download the ZIP
- Open the notebook
loan_default_prediction.ipynb
in Jupyter or Colab - Run all cells sequentially to reproduce the results
Prakash — Aspiring Data Analyst