This project analyzes airline passenger satisfaction data to identify key factors influencing passenger experience and develop predictive models for satisfaction levels. The analysis provides actionable insights for airlines to improve their services and enhance customer satisfaction.
The primary goals of this analysis are to:
- Identify Key Drivers of Passenger Satisfaction: Determine which factors (seat comfort, in-flight entertainment, customer service) most significantly influence passenger satisfaction ratings
- Analyze Flight-Related Impact: Examine how flight duration, delays, and travel class affect passenger satisfaction
- Evaluate In-Flight Service Impact: Assess the effects of Wi-Fi service, food & drink, seat comfort, entertainment, legroom, and cleanliness on satisfaction levels
The dataset contains comprehensive information about airline passenger satisfaction from various airlines, obtained from Kaggle.
- Demographics: Age, Gender, Customer Type (Loyal/Disloyal)
- Flight Details: Flight Distance, Travel Class, Type of Travel
- Service Ratings: Wi-Fi, Food & Drink, Seat Comfort, Entertainment, Legroom, Cleanliness (Scale 1-5)
- Operational Metrics: Departure/Arrival Delays, Gate Location, Baggage Handling
- Target Variable: Satisfaction Level (Satisfied, Neutral, Dissatisfied)
- Online Boarding emerged as the most critical factor (root node in decision tree)
- In-flight Wi-Fi Service and Type of Travel are secondary drivers
- Class of Travel significantly impacts satisfaction levels
- Age groups 23-27 and 42-47 are frequent flyers
- Male passengers show slightly higher satisfaction than female passengers
- Business travelers have significantly higher satisfaction (58.4%) compared to personal travelers (10.1%)
- Economy and Economy Plus classes show much lower satisfaction compared to Business class
- Short-distance flights experience more departure and arrival delays
- Most passenger trips are domestic (< 1500 miles)
- Decision Tree: For identifying key satisfaction drivers (88.46% accuracy)
- Logistic Regression: For flight-related factor analysis (79.9% accuracy)
- Naive Bayes Classifier: For comparison with logistic regression (78.23% accuracy)
- Exploratory Data Analysis (EDA) with various visualizations
- Feature importance analysis
- Correlation analysis
- Predictive modeling
- Model performance evaluation using confusion matrices and ROC curves
The project includes comprehensive visualizations:
- Histograms: Age distribution, flight distance patterns
- Bar Charts: Gender, travel type, and class satisfaction percentages
- Scatter Plots: Flight distance vs. delays analysis
- Box Plots: Service ratings across different travel classes
- Decision Trees: Feature importance visualization
- ROC Curves: Model performance evaluation
- Optimize Online Boarding: Streamline check-in processes and provide clear instructions
- Enhance In-Flight Wi-Fi: Invest in reliable, high-speed connectivity
- Improve Economy Class Services: Focus on cleanliness, comfort, and basic amenities
- Address Short-Flight Delays: Implement measures to reduce delays on domestic routes
- Class-Based Service Differentiation: Tailor services based on travel class expectations
- Travel Type Customization: Develop specific service packages for business vs. personal travelers
- Predictive Satisfaction Management: Use models to proactively identify and address potential dissatisfaction
# Required R packages
install.packages(c("ggplot2", "dplyr", "caret", "rpart", "randomForest",
"ROCR", "e1071", "corrplot", "gridExtra"))
- Clone this repository
- Ensure you have R and RStudio installed
- Install required packages
- Load the dataset from the provided Kaggle link
- Run the R script to reproduce the analysis
Model | Accuracy | Use Case |
---|---|---|
Decision Tree | 88.46% | Overall satisfaction prediction |
Logistic Regression | 79.9% | Flight-related factor analysis |
Naive Bayes | 78.23% | Baseline comparison |
- Real-time Prediction Dashboard: Implement a web dashboard for real-time satisfaction prediction
- Advanced Feature Engineering: Create new features from existing data
- Ensemble Methods: Combine multiple models for improved accuracy
- Sentiment Analysis: Incorporate text feedback analysis
- Time Series Analysis: Study satisfaction trends over time
This project is created for academic purposes. Dataset credit goes to the original Kaggle contributor.
This is an academic project, but suggestions and feedback are welcome! Please feel free to:
- Open issues for bugs or suggestions
- Submit pull requests for improvements
- Share your insights and findings
For questions or collaborations, please reach out to me via linkedin.
This analysis demonstrates the power of data-driven decision making in the airline industry and provides actionable insights for improving passenger satisfaction.