Insurance Cross-Selling Prediction 🎯

Project Overview 📋

A machine learning solution for predicting customer response to vehicle insurance cross-selling campaigns. The model uses an ensemble approach combining CatBoost, LightGBM, and XGBoost to achieve superior predictive performance.

Problem Statement

In the insurance industry, optimizing cross-selling strategies is crucial for business growth. This project aims to predict which health insurance customers are most likely to be interested in an additional vehicle insurance product, enabling targeted marketing campaigns and improved conversion rates.

Key Objectives

Predict customer likelihood to purchase vehicle insurance
Identify key factors influencing purchase decisions
Enable targeted marketing through accurate risk scoring
Optimize resource allocation for marketing campaigns

Data Analysis 📊

Dataset Statistics

Records: 11 million entries
Features: 12 initial features
Generated Features: 4 interaction features
Target Variable: Binary (Response: 1 = Interested, 0 = Not Interested)

Key Features

Academic Factors:
- Age
- Driving License
- Vehicle Age
- Vehicle Damage
Insurance History:
- Previously Insured
- Policy Sales Channel
Financial Factors:
- Annual Premium
- Vintage (Customer Tenure)

Feature Importance (Normalized Scores)

Feature                             Score
Previously_Insured                  1.000
Annual_Premium                      0.876
Vehicle_Damage                      0.754
Age                                0.721
Vehicle_Age                        0.687

Key Insights 📈

Age Impact on Conversion

Age Group    Conversion Rate    Population Share
15-20        28.2%             42.3%
21-25        24.8%             31.7%
26-30        22.1%             12.1%
31-35        19.4%             7.4%
36-40        17.2%             4.2%
41-50        15.8%             2.3%

Vehicle Age and Response Rate

Vehicle Age    Response Rate    Sample Size
< 1 Year       18.4%           42.3%
1-2 Years      25.7%           31.7%
> 2 Years      35.2%           26.0%

Prior Insurance Status Impact

Customers without previous vehicle insurance showed 3.2x higher conversion rates
72.3% of conversions came from previously uninsured customers

Methodology 🔬

Data Preprocessing

Categorical Encoding
- Gender: Male = 1, Female = 0
- Vehicle Damage: Yes = 1, No = 0
- Vehicle Age: Ordinal encoding (0, 1, 2)
Feature Engineering
- Created interaction features with Previously_Insured
- Standardized numerical features
- Applied rare label encoding for Region_Code
Data Optimization
- Reduced memory usage through downcasting
- Optimized categorical encodings
- Streamlined numerical precision

Model Architecture

Ensemble Components

CatBoost

params = {
    'learning_rate': 0.075,
    'depth': 9,
    'l2_leaf_reg': 0.5,
    'max_leaves': 512
}

LightGBM

params = {
    'learning_rate': 0.050,
    'max_depth': 10,
    'num_leaves': 31,
    'min_child_samples': 100
}

XGBoost

params = {
    'eta': 0.05,
    'max_depth': 16,
    'min_child_weight': 5,
    'subsample': 0.839
}

Model Performance 📊

Individual Model Metrics

Model       ROC-AUC     Std Dev
CatBoost    0.8967      ±0.0024
LightGBM    0.8952      ±0.0021
XGBoost     0.8944      ±0.0019

Ensemble Performance

Final ROC-AUC: 0.8970
Improvement: +0.0003 over best single model
Cross-validation: 5-fold stratified CV

Confusion Matrix (Normalized)

Predicted →    Positive    Negative
Positive       0.842       0.158
Negative       0.179       0.821

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
archive		archive
.gitattributes		.gitattributes
README.md		README.md
cat_plot.png		cat_plot.png
kaggle_submission_version.ipynb		kaggle_submission_version.ipynb
portafolio_version.ipynb		portafolio_version.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Insurance Cross-Selling Prediction 🎯

Project Overview 📋

Problem Statement

Key Objectives

Data Analysis 📊

Dataset Statistics

Key Features

Feature Importance (Normalized Scores)

Key Insights 📈

Age Impact on Conversion

Vehicle Age and Response Rate

Prior Insurance Status Impact

Methodology 🔬

Data Preprocessing

Model Architecture

Ensemble Components

Model Performance 📊

Individual Model Metrics

Ensemble Performance

Confusion Matrix (Normalized)

About

Uh oh!

Releases

Packages

Languages

pauloarayasantiago/insurance-cross-selling-prediction

Folders and files

Latest commit

History

Repository files navigation

Insurance Cross-Selling Prediction 🎯

Project Overview 📋

Problem Statement

Key Objectives

Data Analysis 📊

Dataset Statistics

Key Features

Feature Importance (Normalized Scores)

Key Insights 📈

Age Impact on Conversion

Vehicle Age and Response Rate

Prior Insurance Status Impact

Methodology 🔬

Data Preprocessing

Model Architecture

Ensemble Components

Model Performance 📊

Individual Model Metrics

Ensemble Performance

Confusion Matrix (Normalized)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages