This repository contains notebook files concerned with data cleaning and preparation, exploration, analysis and visualization.
The main projects are the exploratory_data_analysis_and_visualization_project and the data_cleaning_and_merges_challenge notebooks, the rest are exercises.
The data contains information about customer orders i.e. quantity of orders, product ordered, date of order, price of product, the promo price, and the category classification of the products, in the time window 2017 to the first quarter of 2018.
The cleaned data from the later project is used for EDA in the former project.
The exploration of the data results in the following observation:
1. Total quantity of orders is 68705 products.
2. Total quantity of orders in_stock is 34613, and that out_of_stock is 26709, as shown in the figure below in percentages.
3. Total sales revenue is 6235951.699999999, while the total value of discounts is 935243.7314200001
which is 15% of the sale revenue.
4. The total revenue per year is 4520441.67 in 2017, and 1715510.03 in 2018, and varies across the months as shown below:
5. The percentages of the category of products in the orders are as shown in the piechart below:
6. The quantity of ordered products compared to the available stock is as shown in the figure below:
The demands of product in the categories accessories, devices, and other are not well-served. Therefore sale is not maximized.
7. Demands are highest at the beginning or end of a year. Thisis proposed to be due to the sale/holiday season.
8. Cheap products are the most demanded goods.
9. Product demands volume varies across the months in a year
10. It may be observed that the weighted discounts across categories are highest among the cheapest of products,
while the revenue is highest for the expensive and affordable products. It might be better to minimize the
discount on the very-cheap, cheap and affordable products while keeping the discount or slightly increasing
the discount, on the expensive and moderatively expensive products.
11. More revenue is generated per month in the year 2018 compared to the year 2017.
12. Most revenues are generated from sales of accessories and devices.
13. Most products are discounted. Very high discounts are few compared to low discounts. Most discounts are in the range 0 to 50.
14. Based on the subcategorization of products, the size of the product name in the figure below corresponds to their sale volume.