This project aims to predict customer churn using machine learning techniques, specifically XGBoost. The goal is to identify customers likely to leave a service, allowing for proactive retention strategies.
The project follows a standard end-to-end machine learning pipeline:
- Data Cleaning β Handled missing values, removed inconsistencies.
- Exploratory Data Analysis (EDA) β Explored distributions, correlations, and key patterns.
- Feature Engineering β Converted categorical variables, normalized numerical features.
- Class Imbalance Handling β Applied techniques like SMOTE to improve recall on minority class.
- Modeling β Trained and tuned an XGBoost classifier for best performance.
- Evaluation β Assessed using accuracy, precision, recall, F1-score, ROC-AUC, and confusion matrix.
- Language: Python 3.x
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, XGBoost, imbalanced-learn
Directory/File | Description |
---|---|
data/ | Cleaned dataset |
notebooks/ | EDA and modeling Jupyter notebooks |
models/ | XGBoost model |
README.md | Project overview and instructions |