Skip to content

This project explores customer behavior using the Bank Marketing dataset to predict term deposit subscriptions. It includes EDA, feature engineering, model training, class imbalance handling, and evaluation using a logistic regression model.

Notifications You must be signed in to change notification settings

hutchay/term-deposit-prediction

Repository files navigation

📊 Term Deposit Subscription Prediction – Bank Marketing Dataset

This project uses classification models to predict whether a customer will subscribe to a term deposit based on a variety of demographic and campaign-related factors.

🧠 Objective

To build a predictive model using the Bank Marketing dataset (2015–2017) to support marketing decision-making through data-driven insights.

🛠️ Tools Used

  • Python
  • Pandas, NumPy
  • Scikit-learn
  • Matplotlib, Seaborn

🔍 Key Techniques

  • Exploratory Data Analysis (EDA)
  • Feature engineering and scaling
  • Handling class imbalance with class_weight='balanced'
  • Logistic Regression, Decision Tree, and Random Forest models
  • Model evaluation using classification report, confusion matrix, and ROC-AUC

📊 Visuals

  • Subscription class distribution

image

  • Confusion Matrices

Logistic Regression
image

  • Decision Tree

image

  • Random Forest

    image

📈 Performance Snapshot - Classification report

image

📁 Files

  • Term_Deposit_Prediction_BankMarketing.ipynb – Full Jupyter notebook with code and results
  • bank.csv – Source dataset (not uploaded here for privacy)

🚀 How to Run

pip install pandas numpy scikit-learn matplotlib seaborn

Then open the notebook in Jupyter or Google Colab and run all cells.

🧠 Interpretation of Results

🎯 Business Context

The goal was to predict whether a customer would subscribe to a term deposit offer, based on demographic and campaign-related features. The challenge: only a small percentage of customers said “yes” — making it a classic imbalanced classification problem.

📈 Model Findings

  • Logistic Regression achieved the best balance for this use case with 62% recall, ideal for identifying likely subscribers.

  • Decision Tree performed well in terms of accuracy but had lower recall, limiting its usefulness for identifying "yes" customers.

  • Random Forest had the highest precision and accuracy but the lowest recall (11%), meaning it missed most of the actual subscribers — not ideal for this use case.

🔍 Key Factors Driving Subscription

  • Contact method (cellular)
  • Education level
  • Credit default history

Note: duration was excluded to prevent data leakage.

✅ What This Means

  • Logistic Regression or an ensemble method that prioritizes recall is recommended when the goal is to identify as many potential subscribers as possible..

  • These models help the bank:

    • Prioritize high-potential customers.
    • Tailor campaigns to effective segments.
    • Reduce wasted effort and cost

📌 Insights & Conclusion

  • The project demonstrates how different models can trade off recall vs. precision — an important business decision point.
  • Logistic Regression, while simple, achieved the best trade-off between recall and precision.
  • Key features like contact method, education, and loan history were the strongest predictors of term deposit subscription.
  • These models can help the bank prioritize outreach, refine targeting, and increase campaign efficiency.

About

This project explores customer behavior using the Bank Marketing dataset to predict term deposit subscriptions. It includes EDA, feature engineering, model training, class imbalance handling, and evaluation using a logistic regression model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published