An advanced end-to-end MLOps pipeline for Telecom Customer Churn Prediction using ZenML, MLflow, Docker, GitHub Actions, and AWS EC2. This project automates everything from data ingestion, preprocessing, model training & evaluation, deployment, drift detection, and CI/CD.
Goal: Predict whether a telecom customer will churn based on customer and service attributes.
Key Features:
- End-to-end modular MLOps architecture
- ZenML pipelines for orchestration
- MLflow tracking
- Dockerized deployment with Streamlit
- Drift detection & auto-retraining via GitHub Actions
- CI/CD: Testing, Docker Build & Deployment via GitHub Actions
- Hosted on AWS EC2
- DynamoDB integration for storing inference data
MLOps-Telecom-Churn-Detection/
├── .github/workflows/ # CI/CD pipeline using GitHub Actions
│ └── cicd.yml # Automates tests, Docker build, and deployment to EC2
│ └── drift_detection.yml # Workflow for scheduled drift detection and retraining
│
├── data/
│ ├── raw/ # Original Telco churn CSV file
│ ├── processed/ # Cleaned and split data: train.csv, val.csv, test.csv
│ └── reference/ # Stores reference data for drift detection
│
├── model/ # Trained and serialized model files
│ ├── best_model.pkl # Best selected model
│ ├── transformer.pkl # Preprocessing transformer object
│ └── [other models].pkl # Additional model candidates (XGBoost, Random Forest, and Logistic Regression.)
│
├── artifacts/ # Model evaluation results and visuals
│ └── evaluation/ # Includes plots and reports
│ ├── roc_curve_<model>.png
│ ├── precision_recall_curve_<model>.png
│ ├── confusion_matrix_<model>.png
│ ├── classification_report_<model>.txt
│ ├── # Other metrics
│
├── data_analysis/
│ ├── eda.ipynb # Jupyter Notebook to demonstrate EDA
│ ├── initial_inspection.py
│ ├── univariate_analysis.py
│ ├── bivariate_analysis.py
│ ├── multivariate_analysis.py
│ ├── missing_values_analysis.py
│
├── pipelines/ # ZenML pipeline definitions
│ └── training_pipeline.py # Main ZenML training + deployment pipeline
│
├── src/ # Core pipeline components and logic
│ ├── config.py # Global config variables and paths
│ ├── utils.py # utils for logging
│ ├── data_ingestion.py # Raw dataset reading logic
│ ├── preprocessing.py # Data cleaning and transformation
│ ├── feature_engineering.py # Feature creation and encoding
│ ├── splitting.py # Splits the dataset into train/val/test
│ ├── training.py # Trains multiple models
│ ├── evaluation.py # Generates classification metrics and plots
│ ├── deployment.py # Deploys a Streamlit app to EC2 using Docker
│ ├── drift_detection.py # Drift detection
│ └── aws_database.py # AWS database functions
│
├── tests/ # Unit and integration tests using Pytest
│ ├── test_data_validation.py
│ ├── test_feature_engineering.py
│ ├── test_model_validation.py
│ ├── test_preprocessing.py
│ ├── test_training.py
│ └── test_smoke.py
│
├── last_processed.txt # Last drift detection date
├── run_pipeline.py # Entry point to trigger the ZenML pipeline
├── Dockerfile # Builds the Docker container for app deployment
├── requirements.txt # All Python dependencies
├── deploy_requirements.txt # All Python dependencies required for deployment
├── README.md # Project documentation
└── LICENSE # Project licensing
Source: Telco Customer Churn Dataset
- Target:
Churn
- Mix of categorical and numerical features
Data is saved in data/raw/Telco-Customer-Churn.csv
, and processed via ZenML ingestion steps.
-
Found class imbalance in churn target
-
Features like tenure, monthly charges, and contract type are key drivers
-
Handled:
- Missing values
- Outliers
- Skewness
- Binning tenure
Visualizations:
- Histograms, Boxplots
- Countplots for categorical variables
- Correlation matrices & pairplots
- ROC, PR curves
EDA plots are saved under artifacts/evaluation/
Pipeline Steps:
- 📥 Data Ingestion →
data_ingestion.py
- 🧼 Preprocessing →
preprocessing.py
- 🧠 Feature Engineering →
feature_engineering.py
- 🔀 Train-Test Split →
splitting.py
- 🤖 Model Training (Multiple Models) →
training.py
- ✅ Evaluation & Best Model Selection →
evaluation.py
- 🚀 Deployment (Streamlit Web App) →
deployment.py
- Pipelines defined using
@step
and@pipeline
decorators - Integrated MLflow logging
- Best model is picked based on ROC-AUC
- Deployment step is final step in the training pipeline
Run with:
zenml up
python run_pipeline.py
- Final pipeline step deploys Streamlit app
- Docker container built with all artifacts
- Hosted on AWS EC2 with port exposed
Inference:
- User enters customer data on frontend
- Real-time prediction + confidence
- Saves inputs to DynamoDB
-
Uses reference data stored in
data/production/reference/
-
Every 30 days:
- GitHub Actions triggers
drift_detection.yml
- Runs Evidently to compare live vs reference data
- If drift is detected: retrains the model using ZenML
- Pushes new model artifacts to repo
- Triggers build + deploy via CI/CD
- GitHub Actions triggers
-
On every push to
main
:- ✅ Run unit tests via
pytest
- 🐳 Build Docker image
- 🚀 Push to DockerHub
- ☁️ Deploy to EC2 server
- ✅ Run unit tests via
- Unit tests for every module
- Run with:
pytest tests/
- 🛢️ DynamoDB used to store prediction logs
- Connected via
boto3
- Used as data source for drift monitoring
Include ROC Curve, Confusion Matrix, Precision-Recall Curve, and F1/Accuracy from model evaluation.
Area | Tool |
---|---|
Orchestration | ZenML |
Model Tracking | MLflow |
Deployment | Docker + Streamlit + EC2 |
CI/CD | GitHub Actions |
Drift Detection | Evidently |
Data Storage | DynamoDB |
Testing | Pytest |
Follow these steps to get this project running locally or in a cloud environment.
git clone https://github.com/HimadeepRagiri/MLOps-Telecom-Churn-Detection.git
cd MLOps-Telecom-Churn-Detection
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt
zenml init
zenml integration install sklearn evidently mlflow -y
python run_pipeline.py
This runs all the stages: ingestion, cleaning, feature engineering, training, evaluation, and deployment to a local Docker container running a Streamlit app.
pytest tests/
streamlit run src/deployment.py
-
GitHub Actions automatically runs on push to
main
:- Runs tests
- Builds Docker image
- Pushes image to Docker Hub
- Deploys to AWS EC2 instance
-
Scheduled via GitHub Actions (cron job)
-
Runs drift detection every 30 days using Evidently
-
If drift detected:
- Triggers full ZenML pipeline
- Retrains model
- Updates model artifacts
- Pushes to
main
branch - GitHub Actions automatically deploys updated model
- AWS EC2: Host your Streamlit API on a t2.micro instance
- Docker Hub: Used to store and pull Docker images
- GitHub Actions: Automates deployment
- DynamoDB: Optional NoSQL database for storing prediction logs or new inference data
- Add support for GCP Cloud Run and Firestore
- Add Slack or email alert on drift
- Real-time feature store
- Experiment tracking UI (like MLflow UI hosted remotely)
This project is licensed under the terms of the MIT License. See the LICENSE file for details.
- ZenML team
- Evidently AI
- Open-source community
- Dataset by IBM/Kaggle
If you find this project useful, feel free to ⭐️ it and share it with others!