Welcome to the ultimate health insurance cost prediction app! π
With our dual-model strategy and smart feature engineering, we've built a system that predicts premiums with RΒ² up to 99% and less than 4% error.
This project started as a simple regression taskβbut after diving deep into error analysis, we uncovered that age plays a huge role in prediction accuracy.
π‘ Solution? We split our models:
- πΆ Young Model (Age β€ 25) β Linear Regression
- π§ Rest Model (Age > 25) β XGBoost Regressor
With the addition of the Genetical Risk feature, both models saw dramatic improvements.
π health-insurance-premium-cost-prediction-xgboost-99-accuracy
βββ app/
β βββ app.py β Streamlit web app
βββ artifacts/ β Visualizations (accuracy, residuals)
βββ notebooks/ β Model development Jupyter notebooks
βββ prediction_helper.py β Model prediction utility
βββ requirements.txt β Python dependencies
βββ README.md β You are here!
Check it out here: Streamlit App
git clone https://github.com/mehulcode12/health-insurance-premium-cost-prediction-xgboost-99-accuracy.git
cd health-insurance-premium-cost-prediction-xgboost-99-accuracy
pip install -r requirements.txt
# Run the Streamlit app
cd app
streamlit run app.py
- Used a single Linear Regression model.
- Problem: 98% RΒ², but error rate > 60% for young users.
- Split the dataset by age (β€ 25 vs > 25).
- Found young users skewed the prediction error.
- Trained separate models for each group:
- Young: Linear Regression
- Rest: XGBoost (with RandomizedSearchCV tuning)
- Added Genetical Risk feature = game changer π₯
- ML Models: Linear Regression, XGBoost
- Tuning: GridSearchCV, RandomizedSearchCV
- Visualization: Matplotlib, Seaborn
- Web App: Streamlit
- Backend: Python, pandas, NumPy
Notebook | Description |
---|---|
ml_premium_prediction_young_with_gr.ipynb |
Model for users β€ 25 yrs |
ml_premium_prediction_rest_with_gr.ipynb |
Model for users > 25 yrs |
SourceOfWork.pdf |
Detailed analysis + methodology |
Have ideas to improve it? Open a PR or raise an issue! Contributions are welcome π
MIT License Β© 2025 Mehul
β If you liked this project, leave a star on GitHub!