This project focuses on cleaning and merging multiple retail CSV files, loading them into SQL, extracting business insights, and performing sales trend analysis using Pandas, Seaborn, and Matplotlib.
-
Data Preparation: Collected multiple retail sales data files from Kaggle. Cleaned, merged, and converted the data into
'cleaned_build_week_data.csv'
. -
SQL Insights: Loaded the dataset into MySQL Workbench and generated insights using SQL queries.
-
Exploratory Data Analysis (EDA) & Visualization: Analyzed data trends and visualized them using Python libraries.
'cleaned_build_week_data.csv'
: Contains retail sales data with the following columns:Order_Date
Customer_ID
Country
UnitPrice
Quantity
TotalPrice
Order_ID
Process | Filename | Description |
---|---|---|
1. Data Collection & Preparation | BW_Data_Preparation.ipynb |
Data collection, cleaning, and preparation. |
2. Exploratory Data Analysis (EDA) | BW_EDA_Visualization.ipynb |
Visualizations and exploratory data analysis (EDA). |
3. SQL Queries | BW_SQL_Queries.sql |
SQL queries in MySQL Workbench. |
4. Cleaned Data | cleaned_build_week_data.csv |
Cleaned data in CSV format. |
5. Insights & Presentation | BW_Insights |
Presentation images summarizing insights. |
1. Data Preparation
Run BW_Data_Preparation.ipynb
to clean the data and save it as cleaned_build_week_data.csv
.
2. Load Data into SQL
Create a database and table using BW_SQL_Queries.sql
. Import cleaned_build_week_data.csv
into the database.
3. SQL Analysis
Execute queries in BW_SQL_Queries.sql
to generate insights:
- Top 10 customers by total spending.
- Total sales by country.
- Find the month with the highest sales.
- Average order value per customer.
4. Exploratory Data Analysis (EDA)
Run BW_EDA_Visualization.ipynb
to:
- Use Pandas for group-by aggregations and filtering.
- Visualizations:
- Time series line plot: Monthly sales trend.
- Bar chart: Top countries by revenue.
- Pie chart: Contribution of top 5 customers.
- Heatmap: Sales by month and country.
We observed a significant increase in sales starting from December 2009, peaking at approximately 70,000. However, the growth was followed by a gradual decline in the subsequent months and years. To boost future sales, we suggest:
- Price reductions,
- Offering more discounts, or
- Promoting high-rated products.
- United States was the top contributor to sales every month.
- United Kingdom saw a spike in December, emerging as the highest contributor for that month.
- France showed the lowest sales revenue.
- Customer ID 16393 contributed 27% of the total orders.
- Customer ID 15542 had the smallest contribution at 15%. This insight could help target top customers with personalized offers or loyalty programs.
- United States led in sales every month, while United Kingdom dominated in December.
- Identifying seasonal trends and focusing marketing efforts on high-performing countries during peak months could maximize revenue.
- GitHub Repository: SAL_BW_Project_3