A simple, interpretable credit approval model built on HMDA data using decision trees
This project explores credit approval modeling using publicly available HMDA (Home Mortgage Disclosure Act) data. The primary objective is to classify loan applications as approved or denied
- AI / Machine Learning
- Credit Analytics
- Decision Tree Models
- Python
HMDA-Credit-Approval/
├─ configs/
│ └─ … # (configuration files for training, preprocessing, etc.)
├─ src/
│ ├─ data/
│ │ ├─ download.py # Script to fetch raw HMDA dataset from source
│ │ ├─ clean.py # Cleans and preprocesses raw data (handling NaNs, formatting, etc.)
│ │ └─ build_dataset.py # Builds final dataset ready for training (feature engineering, splits)
│ └─ scripts/
│ ├─ train.py # Trains decision tree model on prepared dataset and saves model
│ └─ evaluate.py # Loads a trained model and test dataset, computes evaluation metrics
├─ .gitignore
├─ LICENSE
├─ README.md
└─ requirements.txt
-
Clone the repository:
git clone https://github.com/ABS0LUTE888/HMDA-Credit-Approval.git cd HMDA-Credit-Approval -
Create and activate a virtual environment (optional, but recommended):
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Get raw dataset:
python -m src.data.download
Note: This project uses Hydra for configuration management, so you can override any setting straight from the CLI without editing YAMLs. For example:
python -m src.data.download year=2023 # or python -m src.scripts.train model=xgboost seed=67 -
Clean it up:
python -m src.data.clean
-
Build final dataset ready for training:
python -m src.data.build_dataset
-
Run the training script:
python -m src.scripts.train
-
Run the evaluation script:
python -m src.scripts.evaluate
-
TBD
To train or evaluate with a Decision Tree:
python -m src.scripts.train model=decision_tree
python -m src.scripts.evaluate model=decision_treePerformance:
{
"auroc": 0.7861465196838779,
"accuracy": 0.7097934651120077,
"precision": 0.8881341616971602,
"recall": 0.7030027797283301,
"f1": 0.7847983013774197
}Confusion Matrix:
ROC Curve:
To train or evaluate with XGBoost:
python -m src.scripts.train model=xgboost
python -m src.scripts.evaluate model=xgboostPerformance:
{
"auroc": 0.8111004309136769,
"accuracy": 0.7393316793345928,
"precision": 0.8922882998344719,
"recall": 0.7434411721060304,
"f1": 0.8110923852200005
}Confusion Matrix:
ROC Curve:
You can easily add your own models by following these steps:
- Create the model file at
src/models/your_model.pyand implement the model class.Key points:# src/models/your_model.py from src.models.base import BaseModel, DataBundle # change to .base from src.models.registry import register # change to .registry @register("your_model") # must match the YAML name class YourModel(BaseModel): def __init__(self, cfg): super().__init__(cfg) ... def train(self, data: DataBundle): ... def predict(self, X): ... def predict_proba(self, X): ... def evaluate(self, data: DataBundle): ... def export_pipeline(self, preprocessor_path, out_path): ...
- Inherit from either
src.models.base.BaseModelorsrc.models.base.SciKitModel(which already features most of the methods above). - The decorator
@register("your_model")must use the same string that appears in your YAML config (next step).
- Inherit from either
- Create the config at
configs/models/your_model.yaml:name: your_model params: {} # Fixed params that are passed to model instance tuning: # Training params n_jobs: 4 cv: 5 scoring: roc_auc verbose: 4 param_grid: # Params passed to GridSearchCV max_depth: [ 4, 6, 8, 10 ]
- Train and evaluate your model:
python -m src.scripts.train model=your_model python -m src.scripts.evaluate model=your_model
- REST API
- Dockerfile for easy deployment
- Other models
This project is licensed under the MIT License - see the LICENSE file for details.