Customer churn is one of the most critical problems in the telecom industry.
This project aims to predict customer churn using machine learning and provide actionable business insights for customer retention.
The workflow covers:
- Data exploration & visualization
- Data cleaning & preprocessing
- Model building (Decision Tree & XGBoost)
- Evaluation (Accuracy, F1, ROC AUC, Cross-Validation)
- Feature importance & insights
- Saving the best model for future use
- Source: IBM Sample Dataset (Telco Customer Churn)
- File used:
Telco_customer_churn.xlsx
- Target variable:
Churn
(Yes
→ 1,No
→ 0) - Size: ~7,000 customers, 20+ features
Features include:
- Demographics: Gender, SeniorCitizen, Partner, Dependents
- Services: InternetService, Streaming, OnlineSecurity, TechSupport
- Contracts: Contract type, Payment method, Paperless billing
- Financials: Tenure, MonthlyCharges, TotalCharges
- Dropped identifiers and leakage columns (CustomerID, Churn Reason, etc.)
- Converted
TotalCharges
to numeric - Handled missing values with imputation (median for numeric, most_frequent for categorical)
- Scaled numeric features
- One-hot encoded categorical features
- Visualized churn distribution (imbalanced dataset: ~26% churners)
- Explored key drivers: Tenure, Contract type, Monthly charges
- Correlation heatmap for numeric variables
- Decision Tree (baseline): simple, interpretable
- XGBoost (advanced): robust gradient boosting model
Both models were wrapped inside sklearn Pipelines for clean preprocessing + modeling.
Metrics used:
- Accuracy
- Precision, Recall, F1-score
- ROC AUC
- 5-fold Cross-Validation (for XGBoost)
- Extracted top churn drivers from XGBoost
- Visualized the top 15 most important features
Model | Accuracy | F1 Score | ROC AUC |
---|---|---|---|
Decision Tree | ~0.73 | ~0.62 | ~0.74 |
XGBoost | ~0.82 | ~0.71 | ~0.85 |
- XGBoost outperformed the Decision Tree across all metrics.
- Top churn drivers (example): Contract type, Tenure, MonthlyCharges, PaymentMethod, InternetService.
Here are some key visuals from the analysis:
- Contract Type matters: Month-to-Month customers churn the most.
→ Offer discounts or loyalty benefits for longer-term contracts. - Early Tenure churn risk: Customers in their first year are more likely to leave.
→ Focus on onboarding experience and engagement campaigns. - High Monthly Charges: Customers with higher bills show higher churn.
→ Consider competitive pricing or bundling offers.
- Python 3.9+
- Libraries:
pandas
,numpy
,matplotlib
,seaborn
,
scikit-learn
,xgboost
,joblib
# 1. Clone this repo
git clone https://github.com/VedikaSankhe/Telecom-Customer-Churn.git
cd Telecom-Customer-Churn
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run the notebook
jupyter notebook notebooks/customerchurn.ipynb
👩💻 Author
Developed with ❤️ by Vedika Sankhe
If you like this project, don’t forget to ⭐ the repo!
💬 Feedback
If you have any feedback, please reach out at vedikasankhe11@gmail.com
📜 License
This project is licensed under the MIT License – see the LICENSE file for details.