A comprehensive machine learning web application for credit risk assessment built with Flask and advanced ML models.
This project provides an end-to-end solution for credit risk prediction using machine learning. It features a user-friendly web interface built with Flask that allows financial institutions to assess loan default risk in real-time.
- Champion ML Model: Gradient Boosting Classifier achieving 91.90% accuracy and 91.85% AUC
- Advanced Feature Engineering: Binning + Encoding approach for optimal performance
- 5-Tier Risk Assessment: Low, Medium-Low, Medium, Medium-High, High Risk
- Interactive Web Interface: Bootstrap-powered responsive design
- Business Recommendations: Actionable insights for each risk level
- Real-time Predictions: API endpoints for integration
- Comprehensive Testing: Full test suite included
- Accuracy: 91.90%
- AUC Score: 91.85%
- Cross-Validation Score: 91.67%
- Feature Engineering: Binning + Categorical Encoding
- Model Type: Gradient Boosting Classifier (No PCA)
Credit-Risk-Identifier/
│
├── Web_App/ # Flask Web Application
│ ├── app.py # Main Flask application
│ ├── requirements.txt # Python dependencies
│ ├── train_champion_model.py # Model training script
│ ├── test_app.py # Test suite
│ ├── README.md # Web app documentation
│ └── templates/
│ ├── index.html # Input form
│ └── result.html # Results display
│
├── credit-risk-dataset/ # Dataset
│ └── credit_risk_dataset.csv # Main dataset (32,581 samples)
│
├── best_model_pipeline.pkl # Trained model (legacy)
├── modeling_week15_comprehensive.ipynb # ML development notebook
└── INTEGRATION_SUMMARY.md # Technical documentation
- Python 3.8+
- pip package manager
-
Clone the repository
git clone https://github.com/sys0507/Credit-Risk-Identifier.git cd Credit-Risk-Identifier -
Navigate to Web App
cd Web_App -
Install dependencies
pip install -r requirements.txt
-
Train the model (if needed)
python train_champion_model.py
-
Run the application
python app.py
-
Access the app Open your browser and go to
http://localhost:5000
The model requires 11 key features for prediction:
| Feature | Description | Type |
|---|---|---|
person_age |
Age of the person | Numerical |
person_income |
Annual income | Numerical |
person_emp_length |
Employment length (years) | Numerical |
person_home_ownership |
Home ownership status | Categorical |
loan_amnt |
Loan amount requested | Numerical |
loan_intent |
Purpose of the loan | Categorical |
loan_int_rate |
Interest rate | Numerical |
loan_percent_income |
Loan as % of income | Numerical |
loan_grade |
Loan grade | Categorical |
cb_person_default_on_file |
Historical default | Categorical |
cb_person_cred_hist_length |
Credit history length | Numerical |
| Risk Level | Recommendation | Action |
|---|---|---|
| Low Risk | ✅ APPROVE | Standard processing |
| Medium-Low Risk | ✅ APPROVE | Consider terms adjustment |
| Medium Risk | Additional verification | |
| Medium-High Risk | Enhanced due diligence | |
| High Risk | ❌ HIGH RISK | Reject or require collateral |
GET /- Web interfacePOST /predict- Form submissionPOST /api/predict- JSON APIGET /health- Health check
import requests
data = {
"person_age": 25,
"person_income": 50000,
"person_emp_length": 3,
"person_home_ownership": "RENT",
"loan_amnt": 10000,
"loan_intent": "PERSONAL",
"loan_int_rate": 12.5,
"loan_percent_income": 0.2,
"loan_grade": "B",
"cb_person_default_on_file": "N",
"cb_person_cred_hist_length": 5
}
response = requests.post("http://localhost:5000/api/predict", json=data)
result = response.json()
print(f"Risk Level: {result['risk_level']}")
print(f"Probability: {result['probability']:.2%}")Run the comprehensive test suite:
python test_app.pyTests cover:
- Model loading and prediction accuracy
- Form validation and error handling
- API endpoint functionality
- Risk level classification
- Template rendering
The champion model was selected from 54 different model variations tested across:
- 3 Feature Engineering Approaches: Raw Features, Log Transform, Binning+Encoding
- 2 PCA Options: With/Without PCA
- 9 ML Algorithms: Logistic Regression, Decision Tree, Random Forest, XGBoost, LGBM, CatBoost, Gradient Boosting, SVM, KNN
See modeling_week15_comprehensive.ipynb for complete analysis.
- Backend: Flask (Python)
- ML Framework: scikit-learn, XGBoost, CatBoost
- Frontend: Bootstrap 5, Chart.js
- Data Processing: pandas, numpy
- Model Serving: pickle, joblib
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Author: sys0507
- Repository: Credit-Risk-Identifier
- Issues: Report bugs or request features
| Metric | Value |
|---|---|
| Accuracy | 91.90% |
| Precision | 91.23% |
| Recall | 92.15% |
| F1-Score | 91.69% |
| AUC-ROC | 91.85% |
Built with ❤️ for better financial decision making