This project focuses on credit risk analysis, where we leverage machine learning models to predict loan applicant behavior based on various financial and demographic factors. The project includes data preprocessing, feature engineering, model training, and performance evaluation, along with interactive Tableau visualizations to explore key insights.
🔄 Feature Engineering & Data Preprocessing:
- Handling missing values, encoding categorical data, and feature scaling.
- Exploratory Data Analysis (EDA) using statistical summaries and visualizations.
🤖 Machine Learning Model Training:
- Implementing multiple models: Logistic Regression, Decision Trees, Random Forest, and Gradient Boosting.
- Training on cleaned and normalized data for improved predictive performance.
🎯 Model Evaluation & Optimization:
- Accuracy, Precision, Recall, F1-score, and ROC-AUC metrics.
- Hyperparameter tuning using GridSearchCV.
📊 Tableau Visualizations:
- Interactive dashboards highlighting applicant loan behavior, payment trends, and default probabilities.
├── data/ # Directory for datasets (not included in the repo)
│ ├── application_record.csv
│ ├── combined_data.csv # Processed datasets for training
│ ├──
├── Feature_Engineering_Analysis.ipynb # Feature Engineering & Data Preprocessing
├── Machine_Learning_Models.ipynb # Model Training & Evaluation
├── Project_4_Tableau_Visuals.pdf # PDF for easy viewing of Project Visuals
├── Project_4_Data_Visuals.twb # Exported Tableau Visuals
├── requirements.txt # List of dependencies
├── README.md # Project documentation
└── .gitignore # Included to format model use with an API in the future
Clone this repository and install the required dependencies:
git clone https://github.com/87gru/Project-4.git
cd Project-4
pip install -r requirements.txt
1️⃣ Open Jupyter Notebook:
jupyter notebook
2️⃣ Open Feature_Engineering_Analysis
:
- Clear all cell outputs and run sequentially.
3️⃣ Open Machine_Learning_Models.ipynb
:
- Clear all cell outputs and run sequentially.
4️⃣ View Tableau Public Link:
: Achieved classification accuracy of 75%+ using optimized models.
: Identified key financial and demographic factors influencing loan repayment.
: Interactive Tableau dashboards provide insights into loan statuses across different income groups.
📍 Fork the repository.
📍 Create a new branch (git checkout -b feature-branch).
📍 Commit changes (git commit -m "Added feature").
📍 Push to branch (git push origin feature-branch).
📍 Open a Pull Request.
🧑🦱 Jordan Thomas
🧑🦲 Dalton Schmidt