Wine Quality Prediction

Overview

This project aims to predict the quality of wine using various features from two datasets: red and white wine. It addresses the class imbalance problem prevalent in the dataset, particularly in the quality labels.

Problem Statement

The main goal is to predict wine quality based on various chemical properties. The dataset consists of features such as acidity, sugar content, and alcohol level, among others.

Libraries Used

Pandas: For data manipulation and analysis.
NumPy: For numerical operations.
Scikit-learn: For machine learning models and metrics.
Imbalanced-learn: For handling class imbalance using SMOTE.
Seaborn & Matplotlib: For data visualization.

Steps in the Project

Data Loading:

Load the datasets for red and white wine.

import pandas as pd
white_wine = pd.read_csv('winequality-white.csv', sep=';')
red_wine = pd.read_csv('winequality-red.csv', sep=';')

Data Preparation:
- Add a feature indicating the type of wine (red or white).
- Merge the two datasets and shuffle the observations.
- Create a quality label based on the quality score.

Data Exploration:

Visualize the distribution of wine quality labels to identify class imbalance.

import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x=wines['quality_label'])
plt.show()

Data Splitting:
- Split the data into training and test sets.
Data Scaling:
- Scale the features using StandardScaler.

Model Training:

Train a Logistic Regression model on the imbalanced dataset.

from sklearn.linear_model import LogisticRegression
lg = LogisticRegression()
lg.fit(X_train, y_train)

Handling Class Imbalance:

Apply SMOTE to balance the classes in the training set.

from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)

Model Evaluation:

Predict on the test set and evaluate the model using confusion matrix and classification report.

y_pred_smote = lg.predict(X_test)
from sklearn.metrics import confusion_matrix, classification_report
results = confusion_matrix(y_test, y_pred_smote)
print("Confusion Matrix:\n", results)
print("Classification Report:\n", classification_report(y_test, y_pred_smote))

Results

The confusion matrix and classification report will provide insights into the model's performance, particularly in predicting the minority class.

Conclusion

This project demonstrates the importance of addressing class imbalance in predictive modeling. By applying SMOTE, we can improve the model's ability to predict underrepresented classes effectively.

Future Work

Experiment with other machine learning algorithms to further improve prediction accuracy.
Implement hyperparameter tuning for better model performance.
Explore additional feature engineering techniques to enhance the dataset.

Acknowledgments

Wine Quality Dataset from the UCI Machine Learning Repository.
Various libraries and frameworks that facilitate data science and machine learning tasks.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
Wine_EDA_updated.ipynb		Wine_EDA_updated.ipynb
Wine_LogisticR.ipynb		Wine_LogisticR.ipynb
Wine_MNR.ipynb		Wine_MNR.ipynb
Wine_SMOTE_LogisticR.ipynb		Wine_SMOTE_LogisticR.ipynb
winequality-red.csv		winequality-red.csv
winequality-white.csv		winequality-white.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wine Quality Prediction

Overview

Problem Statement

Libraries Used

Steps in the Project

Results

Conclusion

Future Work

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mkhekare/wine_ml

Folders and files

Latest commit

History

Repository files navigation

Wine Quality Prediction

Overview

Problem Statement

Libraries Used

Steps in the Project

Results

Conclusion

Future Work

Acknowledgments

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages