- Problem
- Aim
- Exploratory Data Analysis
- Data Pre-processing
- Feature Engineering
- Feature Analysis
- Machine Learning Model Part
- Conclusion
Exploratory data analysis reveals a significant issue: the payback rate for approved loans is just 31.1%, indicating a high risk of default.
The primary goals of this project are:
- To reduce the financial losses incurred by Fintech company due to high-risk loan approvals.
- To build an efficient classification machine learning model that can predict loan defaults in approved loans.
- Joining loan and payment data.
- Merging datasets and exploring the combined data.
- Focusing on approved and fully funded loans.
- Analyzing missing values.
- Removing features with over 80% missing values.
- Feature classification into numerical, categorical, and other types.
- Imputing missing values in each feature set.
- Developing a new feature: "days_to_process".
- Defining the target variable.
- Standardizing numerical columns using RobustScaler.
- One-hot encoding of categorical features.
- Standardizing numerical columns.
- Preparing the dataset for training.
- Recall
- Precision
- F1-Score
- Random Forest model performance with and without 'loanAmount'.
- Analysis of the model's precision, recall, and F1-score.
- Impact of 'loanAmount' on Type-II errors (false negatives).
The Random Forest model demonstrates high precision and good recall rates. It effectively identifies defaulted loans, which is crucial for minimizing financial losses in Fintech Company's loan approval process. The analysis shows that dropping 'loanAmount' increases the number of false negatives, suggesting its importance in the model. Thus, it is recommended to retain 'loanAmount' in the model to reduce Type-II errors.