Customer churn represents the percentage of users discontinuing service within a given period. This project builds a machine learning pipeline to predict customer churn in a telecom business using historical data and deploys the model as a Flask web app with CI/CD integration using GitHub Actions and AWS EKS for scalable production.
- Developed a machine learning model to predict whether a customer of a telecommunication company will churn.
- Followed a modular structure for the entire project.
- Utilized data of over 7000 records to train and develop the model.
- Cleaned and preprocessed the raw data.
- Performed feature transformation, scaled the numerical features and handled imbalance in the dataset.
- Trained the model using various ML algorithms and selected the best one with higher accuracy.
- Deployed the model using a Flask web application for real-time predictions.
- Integrated CI/CD automation using GitHub Actions to build, test, containerize, and deploy the application to AWS EKS (Elastic Kubernetes Service) on every code update.
- Utilized the company's historical data of over 7000 records which includes information such as demographic details, services subscribed and account information.
- For each customer the following information is available:
- Gender
- Senior Citizen
- Partner
- Dependents
- Tenure
- Phone Service
- Multiple Lines
- Internet Service
- Online Security
- Online Backup
- Device Protection
- Tech Support
- Streaming TV
- Streaming Movies
- Contract Type
- Paperless Billing
- Payment Method
- Monthly Charges
- Total Charges
- Cleaned and preprocessed the raw data:
- Handled missing values.
- Removed duplicate records.
- Removed outliers using zscore to avoid overfitting.
- Replaced boolean values with numerical values.
- Converted the values of tenure column in to bin values with a range of 12 months to ensure effective information understanding.
-
Once the data is cleaned and preprocessed I analyzed the data to identify hidden patterns, relationships between features.
-
Implemented both single and cross feature analysis to find relationships betweent features.
-
Analyzed and visualized each feature to understand its values and the value counts to determine its overall importance.
-
Some of the major findings:
- Among the entire customer base around 16% of them are senior citizens.
- Customers who are more likely to churn have lower monthly and total charges.
- Senior citizen customer have higher churn rates than non senior citizen customers.
- The longer a customer stays with the business, the lower the chances of churning.
- Customers with a tenure of within 1 years have equal chances of both churning and staying in the business.
- Customers with a contract type of month-to-month have left the business more often.
-
Visualizations:
-
Distribution of tenure:
-
Imbalance in churn:
-
Monthly and Total Charges by churn:
- Used different classification algorithms to train the model.
- Logistic Regression
- Naive Bayes
- Knn Classifier
- Decision Tree
- Random Forest
- Adaboost Classifier
- Xgboost Classifier
- Support Vector Classifier
- Performed hyper parameter tunning using GridSearchCV to optimize and improve the performance models.
- Evaluated the models with accuracy score and confusion matrix (percision, recall, f1 score) and selected the model with higher accuracy.
- Out of all the algorithms used, Xgboost classifier had the highest accuracy of 81%.
- Developed a Flask web application to deploy the model for real-time predictions.
- Built both front-end and back-end components for the web app.
- Created a custom website where users can enter customer data and receive predictions from the model.
- Deployed the Flask app on local host server for easy access.
- Implemented an end-to-end continuous integration and deployment pipeline using GitHub Actions.
- The pipeline performs the following steps:
- Runs tests (unit test) on our web application using pytest to ensure the application is working as expected.
- Builds a Docker image of the application and pushes it to Amazon Elastic Container Registry (ECR).
- Updates the Kubernetes manifests with the latest image and deploys the application to Amazon EKS.
- Verifies deployment health by checking pod and service status.
- ML Pipeline: Data preprocessing, feature engineering, and XGBoost modeling
- Web Interface: Flask-based prediction interface
- CI/CD Automation: GitHub Actions pipeline for testing, Dockerization, and deployment
- Cloud Deployment: Kubernetes-managed scalable infrastructure on AWS EKS
- Modular Codebase: Production-ready Python implementation
graph LR
A[Code Commit/trigger] --> B[GitHub Actions]
B --> C[Build Docker Image]
C --> D[Push to AWS ECR]
D --> E[Deploy to EKS]
E --> F[Production API]
Technology | Description |
---|---|
Python | Programming language used |
Flask | Web framework for UI and API integration |
HTML & CSS | Frontend design and styling |
Pandas | Cleaning and preprocessing the data |
Numpy | Performing numerical operations |
Matplotlib | Visualization of the data |
GitHub Actions | Automates build, test, and deployment pipelines |
Docker | Containerization of the application |
Amazon ECR | Docker image registry for container storage |
Amazon EKS | Managed Kubernetes service for production deployment |
Kubernetes | Orchestration platform for scalable deployment |
/πCustomer-Churn-Project
βββ /π.github # GitHub Actions CI/CD workflow
β βββ /πworkflows
β
βββ /πk8s # Kubernetes deployment manifests
β βββ deployment.yaml
β βββ service.yaml
β
βββ /πartifacts # Model artifacts and intermediate data
β
βββ /πdata # Raw and EDA-processed data
β
βββ /πeda_images # Visualizations for EDA
β
βββ /πnotebook # Jupyter notebooks for experimentation
β
βββ /πsrc # Source code (modular ML pipeline)
β βββ /πcomponents # Individual pipeline components
β βββ /πpipelines # Training and prediction pipelines
β
βββ /πstatic # Static assets for the web app
β βββ /πcss
β βββ /πimages
β
βββ /πtemplates # HTML templates for the Flask frontend
β
βββ .dockerignore # Ignore rules for Docker build
βββ Dockerfile # Docker image definition
βββ test_app.py # Unit tests for app functionality
βββ .gitignore # Git ignore rules
βββ README.md # Project documentation
βββ app.py # Flask backend app
βββ requirements.txt # Python dependency list
βββ setup.py # Setup script for packaging
git clone https://github.com/Dhanush-Raj1/Customer-Churn-Project.git
cd Customer-Churn-Project
conda create -p envi python==3.9 -y
source venv/bin/activate # On macOS/Linux
conda activate envi # On Windows
pip install -r requirements.txt
python app.py
The app will be available at: http://127.0.0.1:5000/
1οΈβ£ Open the web app in your browser.
2οΈβ£ Click the predict on the home page of the web app.
3οΈβ£ Enter the customer details in the respective dropdowns.
4οΈβ£ Click the predit button and the predicted results will appear.
β
Improved accuracy of the model with advanced fine tunning
β
Real-Time Prediction System
β
Automated Retraining Pipeline
β
Improve UI with a more interactive design.
β
Customer Retention Strategy Recommender.
β
Anomaly Detection for Unexpected Churn
π‘ Contributions, issues, and pull requests are welcome! Feel free to open an issue or submit a PR to improve this project. π
This project is licensed under the MIT License β LICENSE