This project uses classification models to predict whether a customer will subscribe to a term deposit based on a variety of demographic and campaign-related factors.
To build a predictive model using the Bank Marketing dataset (2015–2017) to support marketing decision-making through data-driven insights.
- Python
- Pandas, NumPy
- Scikit-learn
- Matplotlib, Seaborn
- Exploratory Data Analysis (EDA)
- Feature engineering and scaling
- Handling class imbalance with
class_weight='balanced'
- Logistic Regression, Decision Tree, and Random Forest models
- Model evaluation using classification report, confusion matrix, and ROC-AUC
- Subscription class distribution
- Confusion Matrices
- Decision Tree
Term_Deposit_Prediction_BankMarketing.ipynb
– Full Jupyter notebook with code and resultsbank.csv
– Source dataset (not uploaded here for privacy)
pip install pandas numpy scikit-learn matplotlib seaborn
Then open the notebook in Jupyter or Google Colab and run all cells.
The goal was to predict whether a customer would subscribe to a term deposit offer, based on demographic and campaign-related features. The challenge: only a small percentage of customers said “yes” — making it a classic imbalanced classification problem.
-
Logistic Regression achieved the best balance for this use case with 62% recall, ideal for identifying likely subscribers.
-
Decision Tree performed well in terms of accuracy but had lower recall, limiting its usefulness for identifying "yes" customers.
-
Random Forest had the highest precision and accuracy but the lowest recall (11%), meaning it missed most of the actual subscribers — not ideal for this use case.
- Contact method (cellular)
- Education level
- Credit default history
Note: duration
was excluded to prevent data leakage.
-
Logistic Regression or an ensemble method that prioritizes recall is recommended when the goal is to identify as many potential subscribers as possible..
-
These models help the bank:
- Prioritize high-potential customers.
- Tailor campaigns to effective segments.
- Reduce wasted effort and cost
- The project demonstrates how different models can trade off recall vs. precision — an important business decision point.
- Logistic Regression, while simple, achieved the best trade-off between recall and precision.
- Key features like contact method, education, and loan history were the strongest predictors of term deposit subscription.
- These models can help the bank prioritize outreach, refine targeting, and increase campaign efficiency.