A machine learning project to predict the price per kg of vegetables based on environmental conditions, seasonality, and product condition using various regression techniques including Linear Regression, Support Vector Machine (SVM), and Random Forest Regressor.
The dataset contains information about vegetables collected from a local market, with the following features:
Vegetable
: Type of vegetable (e.g., tomato, potato, cucumber)Season
: Seasonal category (e.g., summer, winter)Month
: Month of observationTemp
: Average temperatureDeasaster Happen in last 3month
: Whether a natural disaster occurred recentlyVegetable condition
: Quality (e.g., fresh, average, scrap)Price per kg
: Target variable
- Data Cleaning: Fixed typos (e.g.,
"scarp"
β"scrap"
) - Handling Missing Values: Replaced blank or missing months with mode
- Encoding: Used one-hot encoding for categorical variables and ordinal encoding for months
- Train-Test Split: 70% training, 30% testing
Model | RΒ² Score | Mean Squared Error |
---|---|---|
Linear Regression | 0.81 | 582.61 |
Support Vector Machine (SVM) | 0.19 | 3430.13 |
Random Forest Regressor | 0.91 | 271.64 |
β Random Forest performs best in both RΒ² and MSE.
-
Clone this repo:
git clone https://github.com/your-username/vegetable-price-prediction.git cd vegetable-price-prediction
-
Install dependencies:
pip install -r requirements.txt
- Run the notebook:
jupyter notebook
Python
Pandas, NumPy
Scikit-learn
Matplotlib, Seaborn
Jupyter Notebook
Vegetable pricing is influenced by seasonal and environmental factors.
Machine learning can effectively forecast prices with relatively small datasets.
Feature encoding plays a crucial role in model accuracy.
Feature correlation for better understanding input relationships