Build a business-aligned, recall-optimized predictive model and interactive dashboard to explore outcomes of a direct marketing campaign using Python and Plotly Dash.
Financial institutions often run targeted campaigns but struggle to convert contacts into subscribers due to a lack of actionable insights and realistic modeling. This project tackles that challenge by combining data analysis, statistical testing, interpretable machine learning, and dashboard design to support informed decision-making.
The goal is to predict whether a customer will subscribe to a term deposit (y
) based on their demographic and contact history. Common pitfalls in such projects include:
- Over-reliance on leaky features like call duration that aren’t known at prediction time.
- Lack of explainability, making model results hard to trust or act on.
- Ineffective segmentation, missing out on high-performing customer micro-groups.
This project addresses those through rigorous preprocessing, interpretable modeling, and interactive data exploration.
- Python (Pandas, Scikit-learn): Data cleaning, encoding, modeling, evaluation
- Statistical Testing: Chi-squared, Mann-Whitney U, Cramér’s V
- SHAP: Explainable AI for feature impact
- Plotly Dash: Interactive dashboard with live visual updates
- Cleaned and normalized data (41K rows, UCI Bank Marketing Dataset)
- Engineered meaningful features like
was_contacted_previously
frompdays
- Prevented leakage by removing
duration
before model training - Applied stratified train-test split and scaling
- Validated categorical-numeric associations via Chi-squared & Mann-Whitney U tests
- Ranked categorical predictors using Cramér’s V (top:
job
,contact
) - Plotted interaction effects (e.g.,
job
×contact
) - Identified seasonal performance trends (Spring/Fall peaks)
- Compared Logistic Regression, Random Forest, SVC, and XGBoost
- Selected Logistic Regression for highest recall (aligned with business needs)
- Tuned hyperparameters via
GridSearchCV
optimizing for recall - Explained predictions using SHAP summary and force plots
Built with Plotly Dash, the dashboard features:
- Dropdown menu to choose a categorical variable (
job
,education
, etc.) - Pie chart showing customer distribution for that variable
- Bar chart showing subscription rates per subcategory
- Responsive callbacks for a smooth user experience
- Built a realistic, high-recall classifier suitable for lead generation
- Delivered statistical insights into campaign performance drivers
- Deployed an interactive dashboard for exploratory marketing analysis
- Gained hands-on experience with model interpretability and data storytelling
- Clone this repository
- Run
app.py
to launch the dashboard - Explore insights via the dropdown-linked visualizations
- Check the directory for full EDA, modeling, and SHAP explanations, alomg with the Insights & Recommendations.
- Source: UCI Bank Marketing Dataset
- Size: ~41,000 rows of customer profiles and campaign outcomes
- Preventing data leakage (via feature elimination) is crucial for honest modeling
- SHAP enables business-aligned storytelling of ML predictions
- Precision-recall trade-offs must be driven by business goals, not just accuracy
This project is licensed under the MIT License.
Naman Kumar
📧 Email: namankr24@gmail.com
🔗 GitHub: NamanKr24