Skip to content

rama-2402/py-superstore-analysis

Repository files navigation

py-superstore-analysis

Superstore Sales Data Analysis is a practice project performing Exploratory Data Analysis on some example Superstore sales data. This project is to learn and understand different data analytic techniques to gain insight from a raw sales data.

Packages Used

This project uses the following python libraries for data analysis

  • Numpy
  • Pandas
  • Matplotlib

Clone the project

The packages used in the project are installed in a python virtual environment to avoid conflicts with existing packages. Follow the steps to clone and run the project in jupyter notebook

git clone https://github.com/rama-2402/py-superstore-analysis.git
cd py-superstore-analysis/

After navigating to the project directory, Create a virtual environment and install the packges using the following commands.

python3 -m venv venv/

#If you are using bash shell 
source venv/bin/activate 

#If you are using Fish shell
source venv/bin/activate.fish

#To install the necessary dependencies
pip install -r requirements.txt 

#To run the jupyter notebook 
jupyter notebook

Troubleshooting

#Nuke the environment
rm -r venv/

#Install a new environment and dependencies
python3 -m venv venv/                 
pip install -r requirements.txt 

Analysis

Following analysis has been made on the sales data

  • Data cleaning
  • Finding Cities with highest sales recorded
  • Finding top 20 profit and loss generating Cities and States
  • Finding most sold items from Inventory and how different shipping methods impacts purchases
  • Finding the impact of different categories on sales and profit of Superstore

Following are the insights and Observations made from the analysis

Observations

  1. New York records the highest number of sales
  2. 78.15% of the Cities in the distribution chain had profit whereas 21.85% of the cities returned losses
  3. 79.59% of the States in the distribution chain had profit whereas 20.41% of the states returned losses
  4. There is an inverse correlation between the Number of products sold vs the profit they bring in and Segment Sales vs Profit.
  5. Products of Technology category although sold very less in numbers contributed to the highest profit
  6. There is a positive correlation between region's sales and profit.
  7. In terms of products Binders has most sales in numbers whereas Copiers brings more profit while having least sales numbers.
  8. "Standard class" is the most preferred shipping mode, but the method of shipment has no significant impact in the profit
  9. Orders with basket size 2 or 3 has most sales indicating consumer interest in buying pairs.
  10. Products with No Discounts has higher sales and 2nd highest profits.
  11. Products with 10% Discount has highest profit margin with lesser sales.
  12. Products in general with over 20% Discounts are generating losses.

Insights from the observations above.

  1. Product Disounts should be limited to 0% to 10%
  2. More focus should be provided to increase sales of category Technology
  3. More focus should be provided to increase sales of sub-category Copiers
  4. More focus should be provided to increase sales of Home Office Segment products for higher profits
  5. Increasing quantity of purchase increses profit generated
  6. There are 116 cities and 10 states that have generated loss, so sales operations could be reduced and focused more on profit generating areas
  7. Central Region has lesser profit compared to other region.

CREDITS

The sales data for this exploratory analysis has been taken from the materials provided in the youtube video. The exploratory analysis was done in my point of view using different approach.

About

Exploratory Data Analysis on Superstore Sales Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published