A machine learning pipeline that predicts heart failure risk using synthetic electronic health record (EHR) data from Synthea. The model is deployed via FastAPI and packaged in Docker for easy deployment.
π Project Features Predicts risk of heart failure (HFYN) from patient data.
Uses Naive Bayes (GaussianNB) with custom priors for balanced classification.
Encodes categorical features using LabelEncoder + OneHotEncoder.
Evaluated with ROC-AUC, classification reports, and threshold tuning.
Deployed with FastAPI + Docker β includes interactive API docs via Swagger.
ποΈ Project Structure
Weaver/
βββ app/ # FastAPI app
β βββ main.py
βββ data/ # Data loading & processing
β βββ DataLoad.py
β βββ DataProcessing.py
βββ model/ # Saved model + encoders
β βββ model_NB.pkl
β βββ onehot_encoder.pkl
β βββ label_encoders.pkl
βββ synthea/ # Synthetic data
βββ Dockerfile
βββ requirements.txt
βββ README.md
βοΈ Setup Instructions
- Clone the Repository
git clone https://github.com/wellDeadpan/Weaver.gitcd Weaver - Build Docker Image
docker build -t weaver-app . - Run Docker Container
docker run -p 8000:8000 weaver-app - Access API Docs Open http://localhost:8000/docs for Swagger UI.
Test the /predict endpoint interactively.
π Example API Request
Endpoint: POST /predict Request Body:
{
"features": [1.0, 0.5, 2.3, 4.4, ..., 29.1] # Total 29 features expected
}
Response:
{
"prediction": [1]
}
π Model Training Summary
Model: GaussianNB(priors=[0.5, 0.5])
Features: Encoded patient demographic + medical data
Target: HFYN (Heart Failure Yes/No)
Evaluation:
ROC-AUC:
Threshold tuned at 0.3 for better recall
π§ Future Improvements Automate data preprocessing in FastAPI
Add logging and error handling
Explore deployment to cloud (e.g., AWS, Render)
Add CI/CD (e.g., GitHub Actions)
π License MIT License
π Acknowledgements Synthetic data from Synthea
Built with love, FastAPI, and Docker π³
π¬ Want Help With... Writing unit tests?
Deploying to cloud?
Improving model performance?
Just ask β Iβm happy to collaborate!