This project predicts flight prices using machine learning techniques in Python. The workflow is implemented in a Jupyter Notebook and uses a cleaned dataset (Clean_Dataset.csv
).
- Data exploration and visualization
- Data preprocessing and feature engineering
- One-hot encoding for categorical variables
- Regression model training (Random Forest)
- Model evaluation (R2, MAE, MSE, RMSE)
- Feature importance analysis
- Visualization of results
- Python 3.11+
- Jupyter Notebook
- pandas
- numpy
- matplotlib
- scikit-learn
A virtual environment (env/
) is included. Activate it before running the notebook:
source env/bin/activate
- Activate the virtual environment:
source env/bin/activate
- Start Jupyter Notebook:
jupyter notebook
- Open
main.ipynb
and run the cells sequentially.
- Load Data: Read and explore the cleaned flight dataset.
- Preprocessing: Drop unnecessary columns, encode categorical features, and transform data for modeling.
- Model Training: Split data, train a Random Forest regressor, and evaluate performance.
- Analysis: Visualize actual vs. predicted prices and analyze feature importances.
main.ipynb
: Main notebook with all code and analysis steps.Clean_Dataset.csv
: Cleaned flight data for modeling.env/
: Python virtual environment with required packages.
This project is for educational purposes.