Predict the insurance cost (charges) for individuals based on demographic and health-related features.
- Affordable health insurance to thousands of customer. You're tasked with creating an automated system to estimate the annual medical expenditure for new customers, using information such as their age, sex, BMI, children, smoking habits and region of residence.
- Estimates from your system will be used to determine the annual insurance premium (amount paid every month) offered to the customer.
- It is a Regression problem, where given the above set of features, we need to estimate the annual medical expenditure for new customers.
- Linear Regression
- KNeighborsRegressor
- DecisionTreeRegressor
- RandomForestRegressor
- GradientBoostingRegressor
- XGBRegressor
- Age
Age of the person - Sex
Gender (male/female) - BMI
Body Mass Index - Children
Number of dependent children - Smoker
Smoking status (yes/no) - Region
Residential area - Charges
Annual insurance charges (Target)
Prepare the data, identifying and extracting key features (both input and output parameters) relevant to the problem you will solve.
Build and train a machine learning model. Here you can evaluate different algorithms, settings and see which model is best for your scenario
| Model | RMSE Before | RMSE After | R² Before | R² After | Comments |
|---|---|---|---|---|---|
| Linear Regression | 5956.34 | 5821.63 | 0.81 | 0.77 | Slight RMSE improvement, minor drop in R² |
| KNeighborsRegressor | 5819.21 | 9872.54 | 0.76 | 0.32 | RMSE and R² worsened—model underperforms after tuning |
| RandomForestRegressor | 4605.38 | 4621.09 | 0.85 | 0.85 | Performance stable, hyperparameters had minimal effect |
| GradientBoostingRegressor | 5819.21 | 4482.26 | 0.86 | 0.86 | RMSE improved significantly with tuning, R² unchanged |
| XGBRegressor | 5819.21 | 4483.77 | 0.82 | 0.86 | Large performance boost from tuning; best generalization |
| DecisionTreeRegressor | 5819.21 | 4770.01 | 0.84 | 0.84 | RMSE dropped with tuning, R² stayed constant |