This project performs data cleaning and transformation on the Superstore Sales dataset to make it ready for interactive dashboarding and analytics.
The Superstore dataset, often used for sales analytics, contains inconsistencies such as:
- Extra spaces in column names
- Missing postal codes
- Incorrect date formats
- Duplicate entries
This Python script processes the raw data into a clean, consistent, and analysis-ready format suitable for BI tools like Tableau or Power BI.
- β Standardized column names (snake_case)
- ποΈ Converted date columns into datetime format
- π§± Filled missing postal codes
- π§Ή Removed duplicate records
- π Created a new
month_year
column for time-based analysis - πΎ Exported final cleaned dataset for visualization
Language | Libraries | Concepts Applied |
---|---|---|
Python | pandas |
Data Cleaning & Preprocessing |
datetime |
Time Series Handling | |
Missing Value Imputation | ||
Feature Engineering (month_year ) |
π¦ Superstore-Sales-Cleaning
βββ superstore_sales_dataset.csv # Raw dataset (input)
βββ superstore_sales_cleaned_final.csv # Cleaned, dashboard-ready output
βββ clean_superstore.py # Python script for cleaning
βββ README.md # Project documentation