- Project Overview
- Dataset Overview
- Data Preprocessing
- Exploratory Data Analysis (EDA)
- Model Architecture
- Model Training and Evaluation
- Results and Discussion
- Conclusion
- License
This project aims to predict whether a loan application will be approved or rejected by a bank based on customer information such as income, age, and credit history. By using Backpropagation Neural Network (BPNN), the project focuses on binary classification—whether a loan is granted (1) or not granted (0).
The dataset consists of 5000 records with 14 features and is used to train and evaluate the model. The classification is done by preprocessing the data, applying a neural network model, and evaluating it on a test dataset.
The dataset for this project contains information about customers applying for a loan, with the following features:
- Age: Age of the customer
- Experience: Years of professional experience
- Income: Annual income of the customer
- Family: Number of family members
- CCAvg: Average credit card spending per month
- Education: Level of education (1 = undergraduate, 2 = graduate, 3 = professional)
- Mortgage: Amount of mortgage the customer has
- Personal Loan: Target variable (1 = loan granted, 0 = loan denied)
- Securities Account: Whether the customer has a securities account (binary)
- CD Account: Whether the customer has a CD account (binary)
- Online: Whether the customer uses online banking (binary)
- CreditCard: Whether the customer uses a credit card with the bank (binary)
- ZIP Code: Customer's postal code (removed during preprocessing)
- ID: Customer's unique ID (removed during preprocessing)
The dataset is available for download from this link.
Data preprocessing was performed to ensure the dataset is suitable for the neural network model:
- Handling Missing Data: No missing values were found in the dataset.
- Incorrect Data Types: The
CCAvg
column was identified as an object type instead of numeric. It was corrected by replacing '/' with '.' and converting it into a float type. - Outliers: The features
Income
,CCAvg
, andMortgage
showed some outliers, but these were kept as they represent real-world customer data. - Dropping Unnecessary Columns: The
ID
andZIP Code
columns were removed since they did not contribute to the prediction task.
Exploratory data analysis (EDA) was conducted to understand the characteristics of the data:
-
Feature Distribution: The distribution of numerical features such as
Income
,CCAvg
, andMortgage
was plotted. Outliers were identified but not removed, as they represent valid cases. -
Correlation Analysis: Strong correlations were observed between
Age
andExperience
, indicating that older customers often have more years of professional experience. -
Class Imbalance: The target variable (
Personal Loan
) showed a slight imbalance between the classes (loan granted vs. loan denied), which was noted for evaluation.
For this binary classification task, a Backpropagation Neural Network (BPNN) was designed with the following architecture:
- Input Layer: 11 nodes (for each of the 11 features).
- Hidden Layers: Two hidden layers, each with 22 nodes and ReLU activation functions.
- Output Layer: One output node with a Sigmoid activation function to predict a binary outcome (loan granted or not).
The model was built using Keras and compiled using the Stochastic Gradient Descent (SGD) optimizer and the binary cross-entropy loss function:
model = Sequential([
Dense(11, input_shape=(11,)),
Dense(22, activation='relu'),
Dense(22, activation='relu'),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])
The dataset was split into:
- 80% for training
- 10% for validation
- 10% for testing
The model was trained for 50 epochs, with early stopping to prevent overfitting. The training process showed a steady increase in accuracy and a decrease in loss.
- Accuracy: 98% accuracy on the test set.
- Precision: Measures the proportion of true positives out of all predicted positives.
- Recall: Measures the proportion of actual positives that were correctly identified.
- F1-Score: Harmonic mean of precision and recall.
The model performed excellently with an accuracy of 98% on the test set, meaning it successfully predicted whether the loan was granted or not for most of the cases. However, precision and recall were also evaluated to understand the performance across different classes.
- Accuracy: The high accuracy indicates that the model generalizes well to unseen data.
- Class Imbalance: Even though the accuracy is high, the class imbalance in the target variable (
Personal Loan
) still affects performance. Further adjustments such as oversampling the minority class could be beneficial. - Feature Importance: Features such as
Income
,CCAvg
, andMortgage
were found to have significant influence on predicting loan approval.
- Class Imbalance: Although accuracy is high, the imbalance between loan approvals and rejections could lead to biased predictions. Techniques like oversampling or class-weight adjustments might improve recall for the minority class.
- Model Generalization: While the model shows good performance, it may benefit from further tuning and additional features to capture more customer information.
This project successfully demonstrates the use of Backpropagation Neural Networks (BPNN) for predicting loan approval decisions based on customer features. The model achieves high accuracy, making it a promising tool for banking institutions to automate loan decision-making processes. Future improvements could focus on addressing class imbalance and fine-tuning the model.
This project is licensed under the MIT License - see the LICENSE file for details.