Superstore Sales Analysis and Forecasting
This project performs an exploratory data analysis (EDA) on the Superstore sales dataset and uses a linear regression model to forecast future sales.
MDC_task.ipynb
: Jupyter Notebook containing the complete code for data loading, cleaning, visualization, and forecasting.Sample - Superstore.csv
: The dataset used for the analysis.
The project follows a standard data science workflow:
- Data Loading and Inspection: The
Sample - Superstore.csv
file is loaded into a pandas DataFrame. Initial checks are performed to understand the data types and identify any missing values. The notebook first unzipsarchive.zip
to get the CSV file, and then loads it usingpandas.read_csv
with thelatin1
encoding. It also checks for data types and missing values. - Data Preprocessing: The 'Order Date' and 'Ship Date' columns are converted from object type to datetime objects to enable time-series analysis.
- Exploratory Data Analysis (EDA): A monthly sales trend is visualized using a line plot to understand sales patterns over time. The notebook aggregates sales by month and creates an interactive line plot using
plotly.express
. - Forecasting: A simple linear regression model from scikit-learn is trained on the historical monthly sales data to predict the sales for the next month.
- Environment Setup: Ensure you have Python and the necessary libraries installed. You can install the required libraries using pip:
pip install pandas plotly scikit-learn numpy
- Download the Data: Download the
archive.zip
file containing theSample - Superstore.csv
dataset and place it in the same directory as the notebook. The notebook is set to unzip this file. - Run the Notebook: Open and run the
MDC_task.ipynb
notebook in a Jupyter environment (like JupyterLab, Google Colab, or VS Code with the Jupyter extension).
The notebook successfully loads and processes the dataset, converting the date columns to the correct format. It then generates a line plot showing the monthly sales trend over several years. Finally, a linear regression model is used to predict the sales for the next month.
The predicted sales for the next month are displayed at the end of the notebook.