This project aims to predict airline profitability using Machine Learning techniques. By analyzing key operational and financial metrics such as Revenue, Cost Efficiency, Flight Delays, Load Factor, and Fuel Consumption, we provide actionable insights to optimize airline operations.
π Key Features
- Predicts Airline Profitability β Helps airlines maximize revenue & reduce costs.
- Data-Driven Insights β Identifies factors affecting profit (Revenue, Delays, Costs, etc.).
- Machine Learning Model β Stacked ensemble model for high accuracy.
- SHAP Interpretability β Explains the impact of each feature on profitability.
- Business Recommendations β Optimize flight schedules, fuel efficiency, and pricing strategies.
π Dataset Description
- The dataset includes historical flight performance data with the following key attributes:
- Flight Number β Unique identifier for each flight.
- Delay (Minutes) β Flight delays impacting operational efficiency.
- Aircraft Utilization (Hours/Day) β How efficiently aircraft are used.
- Revenue (USD) β Total revenue generated per flight.
- Operating Cost (USD) β Total cost incurred for each flight.
- Profit (USD) β Revenue - Operating Cost.
- Load Factor (%) β Seat occupancy percentage.
- Fuel Efficiency (ASK) β Cost of fuel per Available Seat Kilometer.
- Profit Margin (%) β Ratio of profit to revenue.
- Is_Holiday_Season β 1 if the flight occurs during a peak season.
π Target Variable: Profit (USD)
π€ Machine Learning Approach
π Model Selection
We implemented the following models and chose the best-performing one:
- Linear Regression β
- Random Forest Regressor β
- XGBoost Regressor β
- LightGBM Regressor β
Stacked Model (Best Performance)
π Model Performance
-
Model
- MAE (Lower Better)
- RMSE (Lower Better)
- RΒ² Score (Higher Better)
- LightGBM (Before Tuning)
- 94.27
- 120.18
- 0.999956
Stacked Model (Final)
- 25.33 β
- 34.89 β
- 0.999996 β
πΉ Feature Importance (SHAP Analysis): Revenue, Cost Efficiency, Delays, Fuel Cost are top contributors.
-
π Visualizations & Insights
-
π Exploratory Data Analysis (EDA):
- Histogram & Box Plots β Check data distribution & outliers.
- Scatter Plots β Analyze relationships (e.g., Revenue vs. Profit).
- SHAP Feature Importance β Explainability of ML predictions.
π Top 5 Analysis:
-
Category Top 5 Flight Numbers
-
Key Metric
- Most Profitable: FL089, FL011, FL422, FL892, FL629 [Highest Profit]
- Least Profitable: FL694, FL592, FL065, FL107, FL108 [Loss-Making Flights]
- Most Delayed: FL812, FL839, FL829, FL083, FL818 [Longest Delays]
- Most Efficient Flights: FL422, FL841, FL011, FL576, FL147 [Highest Profit Margin]
-
-
Check the Business Report & Presentation in reports/
- CSV file link: (https://drive.google.com/file/d/1vnko_xocyvYu1lmZT4LiSZyojX2UjATk/view?usp=drive_link)
- Video presentation link: (https://drive.google.com/file/d/1zPAUJJ1j8azaLeoDpz3m_nYfH8Z-XlYt/view?usp=drive_link)
Name | Role | GitHub | |
---|---|---|---|
Girish Kumar | Data Analyst | GitHub | |
Komal Yadav | Python Developer | GitHub | |
Rohith Katkuri | Data Engineer | GitHub |
π Feel free to connect with us! π