Skip to content

siddheshwarkoli/Insurance-Cost-Prediction-Regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Insurance-Cost-Prediction-Regression

Predict the insurance cost (charges) for individuals based on demographic and health-related features.

PROBLEM CONTEXT

  • Affordable health insurance to thousands of customer. You're tasked with creating an automated system to estimate the annual medical expenditure for new customers, using information such as their age, sex, BMI, children, smoking habits and region of residence.
  • Estimates from your system will be used to determine the annual insurance premium (amount paid every month) offered to the customer.

TYPE OF MACHINE LEARNING PROBLEM.

  • It is a Regression problem, where given the above set of features, we need to estimate the annual medical expenditure for new customers.

LIST OF ALGORITHMS USES FOR Regression

  • Linear Regression
  • KNeighborsRegressor
  • DecisionTreeRegressor
  • RandomForestRegressor
  • GradientBoostingRegressor
  • XGBRegressor

DATASET OVERVIEW

Feature Description

  • Age
    Age of the person
  • Sex
    Gender (male/female)
  • BMI
    Body Mass Index
  • Children
    Number of dependent children
  • Smoker
    Smoking status (yes/no)
  • Region
    Residential area
  • Charges
    Annual insurance charges (Target)

TASK 1

PREPARE A COMPLETE DATA ANALYSIS REPORT ON THE GIVEN DATA.

TASK 2

Prepare the data, identifying and extracting key features (both input and output parameters) relevant to the problem you will solve.

Build and train a machine learning model. Here you can evaluate different algorithms, settings and see which model is best for your scenario

Model Performance Comparison (Before vs After Hyperparameter Tuning)

Model RMSE Before RMSE After R² Before R² After Comments
Linear Regression 5956.34 5821.63 0.81 0.77 Slight RMSE improvement, minor drop in R²
KNeighborsRegressor 5819.21 9872.54 0.76 0.32 RMSE and R² worsened—model underperforms after tuning
RandomForestRegressor 4605.38 4621.09 0.85 0.85 Performance stable, hyperparameters had minimal effect
GradientBoostingRegressor 5819.21 4482.26 0.86 0.86 RMSE improved significantly with tuning, R² unchanged
XGBRegressor 5819.21 4483.77 0.82 0.86 Large performance boost from tuning; best generalization
DecisionTreeRegressor 5819.21 4770.01 0.84 0.84 RMSE dropped with tuning, R² stayed constant

About

Predict the insurance cost (charges) for individuals based on demographic and health-related features.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published