📦 MLOps-Telecom-Churn-Detection

An advanced end-to-end MLOps pipeline for Telecom Customer Churn Prediction using ZenML, MLflow, Docker, GitHub Actions, and AWS EC2. This project automates everything from data ingestion, preprocessing, model training & evaluation, deployment, drift detection, and CI/CD.

🧠 Project Overview

Goal: Predict whether a telecom customer will churn based on customer and service attributes.

Key Features:

End-to-end modular MLOps architecture
ZenML pipelines for orchestration
MLflow tracking
Dockerized deployment with Streamlit
Drift detection & auto-retraining via GitHub Actions
CI/CD: Testing, Docker Build & Deployment via GitHub Actions
Hosted on AWS EC2
DynamoDB integration for storing inference data

📂 Project Structure

MLOps-Telecom-Churn-Detection/
├── .github/workflows/              # CI/CD pipeline using GitHub Actions
│   └── cicd.yml                    # Automates tests, Docker build, and deployment to EC2
│   └── drift_detection.yml         # Workflow for scheduled drift detection and retraining
│
├── data/    
│   ├── raw/                        # Original Telco churn CSV file
│   ├── processed/                  # Cleaned and split data: train.csv, val.csv, test.csv
│   └── reference/                  # Stores reference data for drift detection       
│
├── model/                             # Trained and serialized model files
│   ├── best_model.pkl             # Best selected model
│   ├── transformer.pkl            # Preprocessing transformer object
│   └── [other models].pkl         # Additional model candidates (XGBoost, Random Forest, and Logistic Regression.)
│
├── artifacts/                         # Model evaluation results and visuals
│   └── evaluation/                # Includes plots and reports
│       ├── roc_curve_<model>.png
│       ├── precision_recall_curve_<model>.png
│       ├── confusion_matrix_<model>.png
│       ├── classification_report_<model>.txt
│       ├── # Other metrics
│
├── data_analysis/ 
│   ├── eda.ipynb                  # Jupyter Notebook to demonstrate EDA
│   ├── initial_inspection.py
│   ├── univariate_analysis.py
│   ├── bivariate_analysis.py
│   ├── multivariate_analysis.py
│   ├── missing_values_analysis.py
│
├── pipelines/                         # ZenML pipeline definitions
│   └── training_pipeline.py       # Main ZenML training + deployment pipeline
│
├── src/                               # Core pipeline components and logic
│   ├── config.py                  # Global config variables and paths
│   ├── utils.py                  # utils for logging 
│   ├── data_ingestion.py         # Raw dataset reading logic
│   ├── preprocessing.py          # Data cleaning and transformation
│   ├── feature_engineering.py    # Feature creation and encoding
│   ├── splitting.py              # Splits the dataset into train/val/test
│   ├── training.py               # Trains multiple models
│   ├── evaluation.py             # Generates classification metrics and plots
│   ├── deployment.py             # Deploys a Streamlit app to EC2 using Docker
│   ├── drift_detection.py        # Drift detection
│   └── aws_database.py           # AWS database functions
│
├── tests/                             # Unit and integration tests using Pytest
│   ├── test_data_validation.py
│   ├── test_feature_engineering.py
│   ├── test_model_validation.py
│   ├── test_preprocessing.py
│   ├── test_training.py
│   └── test_smoke.py
│
├── last_processed.txt                 # Last drift detection date
├── run_pipeline.py                    # Entry point to trigger the ZenML pipeline
├── Dockerfile                         # Builds the Docker container for app deployment
├── requirements.txt                   # All Python dependencies
├── deploy_requirements.txt            # All Python dependencies required for deployment
├── README.md                          # Project documentation
└── LICENSE                             # Project licensing

📊 Data

Source: Telco Customer Churn Dataset

Target: Churn
Mix of categorical and numerical features

Data is saved in data/raw/Telco-Customer-Churn.csv, and processed via ZenML ingestion steps.

🔍 EDA (Exploratory Data Analysis)

Found class imbalance in churn target
Features like tenure, monthly charges, and contract type are key drivers
Handled:
- Missing values
- Outliers
- Skewness
- Binning tenure

Visualizations:

Histograms, Boxplots
Countplots for categorical variables
Correlation matrices & pairplots
ROC, PR curves

EDA plots are saved under artifacts/evaluation/

🔄 MLOps Workflow

Pipeline Steps:

📥 Data Ingestion → data_ingestion.py
🧼 Preprocessing → preprocessing.py
🧠 Feature Engineering → feature_engineering.py
🔀 Train-Test Split → splitting.py
🤖 Model Training (Multiple Models) → training.py
✅ Evaluation & Best Model Selection → evaluation.py
🚀 Deployment (Streamlit Web App) → deployment.py

🔁 ZenML Pipelines

Pipelines defined using @step and @pipeline decorators
Integrated MLflow logging
Best model is picked based on ROC-AUC
Deployment step is final step in the training pipeline

Run with:

zenml up
python run_pipeline.py

⚙️ Deployment (Streamlit + Docker + EC2)

Final pipeline step deploys Streamlit app
Docker container built with all artifacts
Hosted on AWS EC2 with port exposed

Inference:

User enters customer data on frontend
Real-time prediction + confidence
Saves inputs to DynamoDB

⚠️ Drift Detection & Auto-Retraining (GitHub Actions)

Uses reference data stored in data/production/reference/
Every 30 days:
- GitHub Actions triggers drift_detection.yml
- Runs Evidently to compare live vs reference data
- If drift is detected: retrains the model using ZenML
- Pushes new model artifacts to repo
- Triggers build + deploy via CI/CD

🔁 CI/CD – GitHub Actions

On every push to main:
1. ✅ Run unit tests via pytest
2. 🐳 Build Docker image
3. 🚀 Push to DockerHub
4. ☁️ Deploy to EC2 server

🧪 Testing

Unit tests for every module
Run with:

pytest tests/

💾 Database Integration

🛢️ DynamoDB used to store prediction logs
Connected via boto3
Used as data source for drift monitoring

📈 Metrics

Include ROC Curve, Confusion Matrix, Precision-Recall Curve, and F1/Accuracy from model evaluation.

🚀 Tech Stack

Area	Tool
Orchestration	ZenML
Model Tracking	MLflow
Deployment	Docker + Streamlit + EC2
CI/CD	GitHub Actions
Drift Detection	Evidently
Data Storage	DynamoDB
Testing	Pytest

📖 Project Setup & Installation Guide

Follow these steps to get this project running locally or in a cloud environment.

✈️ Clone the Repository

git clone https://github.com/HimadeepRagiri/MLOps-Telecom-Churn-Detection.git
cd MLOps-Telecom-Churn-Detection

📈 Create & Activate a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

✨ Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

📄 Setup ZenML & Initialize Repository

zenml init
zenml integration install sklearn evidently mlflow -y

✏️ Run the ZenML Pipeline Locally

python run_pipeline.py

This runs all the stages: ingestion, cleaning, feature engineering, training, evaluation, and deployment to a local Docker container running a Streamlit app.

🪜 Run Unit Tests

pytest tests/

🏙️ Run Streamlit App (Optional, Local Testing)

streamlit run src/deployment.py

🚀 CI/CD and Docker Deployment

GitHub Actions automatically runs on push to main:
- Runs tests
- Builds Docker image
- Pushes image to Docker Hub
- Deploys to AWS EC2 instance

⏰ Drift Detection & Retraining

Scheduled via GitHub Actions (cron job)
Runs drift detection every 30 days using Evidently
If drift detected:
- Triggers full ZenML pipeline
- Retrains model
- Updates model artifacts
- Pushes to main branch
- GitHub Actions automatically deploys updated model

🌎 Optional Cloud Setup (Free Tier)

AWS EC2: Host your Streamlit API on a t2.micro instance
Docker Hub: Used to store and pull Docker images
GitHub Actions: Automates deployment
DynamoDB: Optional NoSQL database for storing prediction logs or new inference data

🚧 Future Enhancements

Add support for GCP Cloud Run and Firestore
Add Slack or email alert on drift
Real-time feature store
Experiment tracking UI (like MLflow UI hosted remotely)

📜 License

This project is licensed under the terms of the MIT License. See the LICENSE file for details.

🙌 Credits

ZenML team
Evidently AI
Open-source community
Dataset by IBM/Kaggle

🌟 Star the Repo

If you find this project useful, feel free to ⭐️ it and share it with others!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📦 MLOps-Telecom-Churn-Detection

🧠 Project Overview

📂 Project Structure

📊 Data

🔍 EDA (Exploratory Data Analysis)

🔄 MLOps Workflow

🔁 ZenML Pipelines

⚙️ Deployment (Streamlit + Docker + EC2)

⚠️ Drift Detection & Auto-Retraining (GitHub Actions)

🔁 CI/CD – GitHub Actions

🧪 Testing

💾 Database Integration

📈 Metrics

🚀 Tech Stack

📖 Project Setup & Installation Guide

✈️ Clone the Repository

📈 Create & Activate a Virtual Environment

✨ Install Dependencies

📄 Setup ZenML & Initialize Repository

✏️ Run the ZenML Pipeline Locally

🪜 Run Unit Tests

🏙️ Run Streamlit App (Optional, Local Testing)

🚀 CI/CD and Docker Deployment

⏰ Drift Detection & Retraining

🌎 Optional Cloud Setup (Free Tier)

🚧 Future Enhancements

📜 License

🙌 Credits

🌟 Star the Repo

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
artifacts/evaluation		artifacts/evaluation
data		data
data_analysis		data_analysis
model		model
pipelines		pipelines
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
deploy_requirements.txt		deploy_requirements.txt
last_processed.txt		last_processed.txt
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

License

HimadeepRagiri/MLOps-Telecom-Churn-Detection

Folders and files

Latest commit

History

Repository files navigation

📦 MLOps-Telecom-Churn-Detection

🧠 Project Overview

📂 Project Structure

📊 Data

🔍 EDA (Exploratory Data Analysis)

🔄 MLOps Workflow

🔁 ZenML Pipelines

⚙️ Deployment (Streamlit + Docker + EC2)

⚠️ Drift Detection & Auto-Retraining (GitHub Actions)

🔁 CI/CD – GitHub Actions

🧪 Testing

💾 Database Integration

📈 Metrics

🚀 Tech Stack

📖 Project Setup & Installation Guide

✈️ Clone the Repository

📈 Create & Activate a Virtual Environment

✨ Install Dependencies

📄 Setup ZenML & Initialize Repository

✏️ Run the ZenML Pipeline Locally

🪜 Run Unit Tests

🏙️ Run Streamlit App (Optional, Local Testing)

🚀 CI/CD and Docker Deployment

⏰ Drift Detection & Retraining

🌎 Optional Cloud Setup (Free Tier)

🚧 Future Enhancements

📜 License

🙌 Credits

🌟 Star the Repo

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages