This project focuses on analyzing and predicting avocado prices in the United States using historical sales data. We applied data science and machine learning techniques to identify trends, patterns, and future prices.
- Source: Hass Avocado Board via Kaggle
- File:
Avocado.csv
- Features Include:
Date
: The date of the observationAveragePrice
: Average price per avocadoTotal Volume
: Total number of avocados sold4046
,4225
,4770
: PLU (Product Lookup) codes representing different avocado typesTotal Bags
,Small Bags
,Large Bags
,XLarge Bags
type
: Conventional or organicyear
: Year of observationregion
: Geographic market region in the U.S.
- Explore and clean avocado price data.
- Perform exploratory data analysis (EDA).
- Build regression models to predict avocado prices.
- Visualize results and insights.
- Evaluate and compare model performance.
- Environment: Jupyter Notebook
- Libraries Used:
- Pandas & NumPy (data manipulation)
- Matplotlib & Seaborn (visualization)
- Scikit-learn (machine learning)
- Statsmodels (statistical analysis)
- Detailed EDA including region-wise trends and yearly price changes.
- Applied multiple regression techniques including:
- Linear Regression
- Random Forest Regressor
- Decision Tree
- Included data preprocessing steps like encoding, feature scaling, and train-test split.
- Visualized prediction results to evaluate model accuracy.
AvocadoGroupProject.ipynb
: Main notebook containing code, analysis, and model building.Avocado.csv
: Dataset used for training and evaluation.
- Clone the repository:
git clone https://github.com/yourusername/avocado-price-prediction.git cd avocado-price-prediction
- Install the required packages:
pip install -r requirements.txt
- Launch the notebook:
jupyter notebook AvocadoGroupProject.ipynb