Skip to content

vimalsolank1/project_ML_Classification_Flipkart_Customer_Service_Satisfaction

Repository files navigation

Predicting Customer Satisfaction Score Using Classification

download

BUSINESS PROBLEM OVERVIEW

In today’s fast paced world, giving good customer service is very important to keep customers happy and grow the business. Companies check how well their support teams are doing by using a score called CSAT (Customer Satisfaction Score). But this score is given after the support call is finished. So if the customer is unhappy and gives a low score, it’s already too late to fix the problem or make them happy again.

Flipkart aims to enhance its customer service by understanding the key drivers behind customer satisfaction. The company receives thousands of customer interactions daily through various support channels. These interactions are recorded with multiple features like ticket category, issue type, resolution time, and support channel used. However, predicting whether a customer will be satisfied or not after an interaction remains a challenge.

The goal of this project is to build a machine learning classification model that can predict customer satisfaction (CSAT) based on ticket features and interaction details. This will enable Flipkart to:

  • Proactively identify potential dissatisfaction

  • Allocate resources effectively

  • Personalize customer support

  • Improve service quality and agent performance

Project Summary -

  • The main aim of this project is to build a machine learning classification model to predict Customer Satisfaction (CSAT) score for Flipkart's customer service. The goal is to find important factors behind customer satisfaction, detect low satisfaction cases early, and help improve overall customer experience.

  • This project is divided into foir main parts:

    1. Data Understanding & Variable Exploration

      • Loaded the dataset, understood column types, checked for duplicates and missing values.
    2. Data Wrangling & EDA & Hypothesis Testing

      • Cleaned the data, handled outliers, and performed univariate, bivariate, and multivariate analysis to uncover key insights and Conducted hypothesis testing.
    3. Feature Engineering & ML Modeling

      • Engineered relevant features, applied preprocessing, trained multiple ML models, and selected the best one using evaluation metrics.
    4. Real-World Testing & Conclusion

      • Tested the saved model with unseen data, evaluated generalization, and summarized insights with suggestions for future improvements.
  1. Data Understanding & Variable Exploration :
  • In this project dataset have 85907 rows and 20 columns and this dataset focusing on Flipkart’s customer support data.
  • All the columns are categorized into three data types: object, float64, and int64.
  • The column present in dataset are unique id,channel_name,customer query category , customer query Sub-category ,Customer Remarks, Order_id,order_date_time,Issue_reported at issue_responded, Survey_response_Date,Customer_City,Product_category,Item_price,connected_handling_time,Agent_name,Supervisor,Manager ,Tenure Bucket,Agent Shift,CSAT
  • The dataset contains no duplicate rows, but several columns have missing values, including: connected_handling_time (99.72%), Customer_City (80.12%), Product_category (79.98%), Item_price (79.97%), order_date_time (79.96%), Customer Remarks (66.54%), and Order_id (21.22%). .
  1. Data Wrangling & EDA & Hypothesis Testing:
  • In the data wrangling phase, I handled missing values by dropping the connected_handling_time column due to 99.72% nulls and transformed date columns into datetime format. I also engineered a new feature response_time_minutes from time differences and cleaned text data for consistency.
  • During the EDA step, i performed detailed univariate bivariate and multivariate analysis to uncover patterns in CSAT scores across channels, categories, products, agents, supervisors, managers, and shifts. i found that faster response times and experienced agents often led to higher CSAT. Some categories like GiftCards and channels like Email had lower satisfaction and need improvement. These insights can guide Flipkart to improve service quality, optimize agent performance, and boost overall customer satisfaction.
  • In the hypothesis testing step, i used statistical tests like one-sample t-tests and ANOVA to check if the average CSAT score and item price significantly differ from given values, and if CSAT varies across communication channels.
  1. Feature Engineering & ML Modeling :
  • In the Feature Engineering & Data Preprocessing step, I handled missing values using constant, mode, and mean imputation based on column type and group logic. Outliers were capped using IQR and percentiles, and new features like is_long_response and avg_csat_by_agent were created. I used one-hot encoding for small categorical columns and label encoding for high-cardinality ones. Important features were selected using techniques like correlation check, Random Forest importance, and SelectKBest, followed by data scaling and SMOTE to balance the classes.
  • In the ML Model Implementation step, I tested multiple models like Random Forest, CatBoost, and XGBoost. After comparing their performance using accuracy, recall, precision, and F1-score, I finalized XGBoost as it delivered the best overall results. I also used feature explainability tools like SHAP to interpret how each feature impacted predictions
  1. Real-World Testing & Conclusion:
  • In the real-world testing phase, the saved XGBoost model performed well on unseen data, confirming its reliability and generalization. However, the model struggled to accurately predict CSAT classes 2 and 3, highlighting the need for better quality and balanced data. With improved data in the future, these challenges can be addressed to boost overall model accuracy and business impact.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published