Skip to content

Predicts which telecom customers are likely to churn with 95% accuracy using real-world data features from usage, billing, and support data. Implements Sturges-based binning, one-hot encoding, stratified 80/20 train-test split, and a two-level ensemble pipeline with soft voting. Achieves 94.60% accuracy, 0.8968 AUC, 0.8675 precision, 0.7423 recall.

License

Notifications You must be signed in to change notification settings

ReverendBayes/Telecom-Churn-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Telecom Churn Predictor

Churn classification model for telecom customer datasets.
94.60% Accuracy | 0.8968 AUC | 0.8675 Precision | 0.7423 Recall

Predicts which customers are likely to leave using a stacked ensemble of four classifiers.
Built with real telecom data, trained with stratified validation, and fully reproducible.
This repository includes a complete pipeline: feature engineering, model stacking, and evaluation.

For context, Charter Communications, the telecom company this was built for, was previously relying on a spaCy-based model that achieved only ~40% accuracy.


What It Does

This model predicts customer churn for telecom operators.
It learns from customer usage patterns, billing behavior, service plans, and support interactions.
Given raw input data, it outputs a churn probability between 0 and 1 for each customer.

The output helps retention teams target at-risk customers before they leave.


Evaluation Metrics

Metric Value Description
Accuracy 94.60% Overall correct predictions
AUC 0.8968 Separation between churners and non-churners
Precision 0.8675 % of predicted churners that actually churned
Recall 0.7423 % of actual churners correctly identified

Evaluation done on 80/20 stratified train/test split.
Base learners trained using k-fold CV. Meta-learner trained on out-of-fold predictions.


Model Architecture: Two-Level Stacked Ensemble

The model is structured as a two-tiered ensemble, where each layer plays a distinct role in prediction.

Level 1: Base Learners

At the first level, only one model is used:

  • XGBoostClassifier

This model learns patterns from customer attributes (usage, billing, service plans, etc.) and produces a churn probability.

Level 2: Meta-Learners

The predicted churn probability from Level 1 (XGBoost) is combined with the original feature set, then used as input to train three meta-learners:

  • LogisticRegression
  • DecisionTreeClassifier
  • GaussianNB

Each of these meta-models learns a slightly different decision boundary based on the XGBoost signal and the original data. These models each output a second-level probability of churn.

These three second-layer probabilities are then combined via a weighted soft vote:

  • Logistic Regression: 0.4
  • Decision Tree: 0.3
  • Naive Bayes: 0.3

The result is a final, blended churn probability that reflects multiple modeling assumptions.

This architecture improves generalization and avoids over-reliance on any single model’s biases.


Design Snapshot

Component My Model
Bucketing Strategy Per-feature Sturges-based bin count with equidistant discretizing
Ensemble Structure Two-stage pipeline: XGB → (LR, DT, NB) → weighted soft vote
Train/Test Split 80/20 stratified
Feature Selection Original + 12 grouped features (33 total)
Voting Mechanism Weighted soft vote (LR: 0.4, DT: 0.3, NB: 0.3)

Conclusion

The final system is a layered, structured ensemble with strong performance and high transparency. Logistic regression as a meta-learner effectively balances outputs from diverse base classifiers, while quantile-based bucketing and complete categorical encoding ensure that the full information space is available during training.

A 94.6% accuracy and 0.8968 AUC make this implementation a strong benchmark for practical churn prediction. The modular architecture, clean feature processing, and documented evaluation steps support easy replication and extension—whether for production deployment or integration with retention strategy tools.


├── churn_model/
│   ├── predict.py
│   ├── train.py
│   ├── artifacts/
│   │   ├── xgb_model.joblib
│   │   ├── lr_model.joblib
│   │   ├── dt_model.joblib
│   │   ├── nb_model.joblib
│   │   └── preprocessor.joblib

How to Run

git clone https://github.com/yourusername/telecom-churn-predictor.git
cd telecom-churn-predictor
pip install -r requirements.txt
python train.py

About

Predicts which telecom customers are likely to churn with 95% accuracy using real-world data features from usage, billing, and support data. Implements Sturges-based binning, one-hot encoding, stratified 80/20 train-test split, and a two-level ensemble pipeline with soft voting. Achieves 94.60% accuracy, 0.8968 AUC, 0.8675 precision, 0.7423 recall.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages