This project implements an inference pipeline for heart disease prediction using a pre-trained XGBoost model. It provides functionality for loading the model, preprocessing new data, and making predictions.
gillopy-Deployment_XGBoost_Inference_Heart_Disease_UCI/
├── README.md # Project documentation
├── Dockerfile # Docker container configuration
├── LICENSE # Project license
├── pyproject.toml # Poetry dependency management
├── .dockerignore # Docker build exclusions
├── models/ # Pre-trained model files
│ ├── trained_model_2025-01-06.joblib
│ └── trained_model_2025-01-08.joblib
├── src/ # Source code
│ ├── data_preprocessor.py # Data preprocessing functionality
│ ├── inference.py # Main inference pipeline
│ └── model_loader.py # Model loading utilities
└── tests/ # Test suite
├── __init__.py
├── test_data_preprocessor.py
├── test_inference.py
└── test_model_loader.py
- Python 3.10 (specific version requirement)
All dependencies are managed through Poetry and specified in pyproject.toml:
- pandas (^2.2.3)
- scikit-learn (^1.6.0)
- xgboost (^2.1.3)
- joblib (^1.4.2)
-
Clone the Repository:
git clone https://github.com/gillopy/Deployment_XGBoost_Inference_Heart_Disease_UCI cd Deployment_XGBoost_Inference_Heart_Disease_UCI
-
Install Dependencies:
poetry install
-
Docker Setup (optional):
docker build -t heart-disease-inference .
Execute the main inference script:
poetry run python src/inference.py
The model expects input data in the following format:
{
"age": int,
"sex": int,
"cp": int,
"trestbps": int,
"chol": int,
"fbs": int,
"restecg": int,
"thalach": int,
"exang": int,
"oldpeak": float,
"slope": int,
"ca": float,
"thal": float
}
Run the test suite using pytest:
poetry run pytest
The test suite includes:
- Data preprocessing validation
- Inference pipeline testing
- Model loading verification
- Handles missing value imputation
- Converts input dictionary to DataFrame format
- Implements data validation checks
- Loads the pre-trained XGBoost model
- Includes error handling for missing model files
- Validates model compatibility
- Orchestrates the complete inference process
- Supports batch predictions
- Provides formatted output
The project includes Docker support for containerized deployment:
- Base Python 3.10 image
- Automatic dependency installation
- Environment isolation
Apache License 2.0
Guillermo (guillermocabrera9710@gmail.com)