Skip to content

Welcome to the ultimate health insurance cost prediction app! πŸš€ With our dual-model strategy and smart feature engineering, we've built a system that predicts premiums with RΒ² up to 99% and less than 4% error.

Notifications You must be signed in to change notification settings

mehulcode12/health-insurance-premium-cost-prediction-xgboost-99-accuracy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Health Insurance Premium Cost Predictor (99% Accuracy)

Welcome to the ultimate health insurance cost prediction app! πŸš€
With our dual-model strategy and smart feature engineering, we've built a system that predicts premiums with RΒ² up to 99% and less than 4% error.


πŸ“‚ Project Overview

This project started as a simple regression taskβ€”but after diving deep into error analysis, we uncovered that age plays a huge role in prediction accuracy.

πŸ’‘ Solution? We split our models:

  • πŸ‘Ά Young Model (Age ≀ 25) β†’ Linear Regression
  • πŸ§“ Rest Model (Age > 25) β†’ XGBoost Regressor

With the addition of the Genetical Risk feature, both models saw dramatic improvements.


πŸ” Quick Look

πŸ“ health-insurance-premium-cost-prediction-xgboost-99-accuracy
β”œβ”€β”€ app/
β”‚   └── app.py               ← Streamlit web app
β”œβ”€β”€ artifacts/               ← Visualizations (accuracy, residuals)
β”œβ”€β”€ notebooks/               ← Model development Jupyter notebooks
β”œβ”€β”€ prediction_helper.py     ← Model prediction utility
β”œβ”€β”€ requirements.txt         ← Python dependencies
└── README.md                ← You are here!

🌐 Live Demo

Check it out here: Streamlit App

πŸš€ Try It Out

git clone https://github.com/mehulcode12/health-insurance-premium-cost-prediction-xgboost-99-accuracy.git
cd health-insurance-premium-cost-prediction-xgboost-99-accuracy
pip install -r requirements.txt

# Run the Streamlit app
cd app
streamlit run app.py

🧠 Modeling Strategy

1️⃣ Initial Baseline

  • Used a single Linear Regression model.
  • Problem: 98% RΒ², but error rate > 60% for young users.

2️⃣ Root Cause Analysis

  • Split the dataset by age (≀ 25 vs > 25).
  • Found young users skewed the prediction error.

3️⃣ Dual Model Solution

  • Trained separate models for each group:
    • Young: Linear Regression
    • Rest: XGBoost (with RandomizedSearchCV tuning)
  • Added Genetical Risk feature = game changer πŸ’₯

πŸ“Š Model Evaluation

πŸ‘Ά Age ≀ 25 (Linear Regression + Genetical Risk)

  • RΒ² (Train/Test): ~0.99 / 0.99
  • πŸ“‰ Residuals:
    Young Error
  • βœ… Accuracy:
    Young Accuracy

πŸ§“ Age > 25 (XGBoost + Genetical Risk)

  • RΒ² (Train/Test/CV): ~0.997 / 0.997 / 0.997
  • πŸ“‰ Residuals:
    Rest Error
  • βœ… Accuracy:
    Rest Accuracy

πŸ›  Tech Stack

  • ML Models: Linear Regression, XGBoost
  • Tuning: GridSearchCV, RandomizedSearchCV
  • Visualization: Matplotlib, Seaborn
  • Web App: Streamlit
  • Backend: Python, pandas, NumPy

πŸ“˜ Related Files

Notebook Description
ml_premium_prediction_young_with_gr.ipynb Model for users ≀ 25 yrs
ml_premium_prediction_rest_with_gr.ipynb Model for users > 25 yrs
SourceOfWork.pdf Detailed analysis + methodology

🀝 Contribute

Have ideas to improve it? Open a PR or raise an issue! Contributions are welcome πŸ™Œ


πŸ“„ License

MIT License Β© 2025 Mehul


⭐ If you liked this project, leave a star on GitHub!

About

Welcome to the ultimate health insurance cost prediction app! πŸš€ With our dual-model strategy and smart feature engineering, we've built a system that predicts premiums with RΒ² up to 99% and less than 4% error.

Topics

Resources

Stars

Watchers

Forks