Dajubhai MiniMart Analysis
This repository contains the Jupyter Notebook for dajubhai minimart analysis. The notebook demonstrates a full data science workflow: from data cleaning to model building using a dataset named DajuBhaiMinimart.csv
.
ITS69304_SupremKhatri_Individual_Assignment.ipynb
— Main notebook with code and analysis.DajuBhaiMinimart.csv
— Dataset (assumed provided externally).
The following libraries are used:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
import plotly.io as pio
pio.renderers.default = 'notebook'
df = pd.read_csv('DajuBhaiMinimart.csv')
df.head()
- Reads and displays initial data.
- Checks data info:
df.info()
- Checks for missing values:
df.isnull().sum()
- Drops unnecessary columns:
df.drop('CustomerID', axis=1, inplace=True)
- Renames columns where necessary.
- Uses Seaborn and Plotly to visualize relationships and distributions.
- Example plots:
sns.countplot(x='ProductCategory', data=df) px.histogram(df, x='Sales', color='ProductCategory')
- Creates or modifies features to improve model input.
- Handles categorical encoding, scaling, etc.
- Trains models such as:
- Gradient Boosting Regressor
- Example code:
from sklearn.ensemble import GradientBoostingRegressor model = GradientBoostingRegressor() model.fit(X_train, y_train)
- Performs grid search / cross-validation:
from sklearn.model_selection import GridSearchCV params = {'n_estimators': [100, 200], 'learning_rate': [0.1, 0.05]} grid = GridSearchCV(GradientBoostingRegressor(), params, cv=5) grid.fit(X_train, y_train)
✅ Clone or download this repository.
✅ Install requirements:
pip install pandas matplotlib seaborn plotly scikit-learn
✅ Run the notebook:
jupyter notebook ITS69304_SupremKhatri_Individual_Assignment.ipynb
- Ensure
DajuBhaiMinimart.csv
is present in the notebook directory. - All visualizations are rendered using Plotly + Seaborn + Matplotlib.
- The notebook suppresses warnings for cleaner output.
- Extend analysis with additional models (e.g. XGBoost, Random Forest).
- Improve feature engineering with domain knowledge.
- Deploy model via Flask / Streamlit for interactive use.