This is a complete, real-world data analyst portfolio project based on an e-commerce inventory dataset scraped from Zepto — one of India’s fastest-growing quick-commerce startups. This project simulates real analyst workflows, from raw data exploration to business-focused data analysis.
This project is perfect for:
- 📊 Data Analyst aspirants who want to build a strong Portfolio Project for interviews and LinkedIn
- 📚 Anyone learning SQL hands-on
- 💼 Preparing for interviews in retail, e-commerce, or product analytics
🎥 Watch this YouTube video to implement the full project from scratch:
🔗 Link to Video: Watch on Youtube
The goal is to simulate how actual data analysts in the e-commerce or retail industries work behind the scenes to use SQL to:
✅ Set up a messy, real-world e-commerce inventory database
✅ Perform Exploratory Data Analysis (EDA) to explore product categories, availability, and pricing inconsistencies
✅ Implement Data Cleaning to handle null values, remove invalid entries, and convert pricing from paise to rupees
✅ Write business-driven SQL queries to derive insights around pricing, inventory, stock availability, revenue and more
The dataset was sourced from Kaggle and was originally scraped from Zepto’s official product listings. It mimics what you’d typically encounter in a real-world e-commerce inventory system.
Each row represents a unique SKU (Stock Keeping Unit) for a product. Duplicate product names exist because the same product may appear multiple times in different package sizes, weights, discounts, or categories to improve visibility – exactly how real catalog data looks.
🧾 Columns:
-
sku_id: Unique identifier for each product entry (Synthetic Primary Key)
-
name: Product name as it appears on the app
-
category: Product category like Fruits, Snacks, Beverages, etc.
-
mrp: Maximum Retail Price (originally in paise, converted to ₹)
-
discountPercent: Discount applied on MRP
-
discountedSellingPrice: Final price after discount (also converted to ₹)
-
availableQuantity: Units available in inventory
-
weightInGms: Product weight in grams
-
outOfStock: Boolean flag indicating stock availability
-
quantity: Number of units per package (mixed with grams for loose produce)
Here’s a step-by-step breakdown of what we do in this project:
We start by creating a SQL table with appropriate data types:
CREATE TABLE zepto (
sku_id SERIAL PRIMARY KEY,
category VARCHAR(120),
name VARCHAR(150) NOT NULL,
mrp NUMERIC(8,2),
discountPercent NUMERIC(5,2),
availableQuantity INTEGER,
discountedSellingPrice NUMERIC(8,2),
weightInGms INTEGER,
outOfStock BOOLEAN,
quantity INTEGER
);
-
Loaded CSV using pgAdmin's import feature.
-
If you're not able to use the import feature, write this code instead:
\copy zepto(category,name,mrp,discountPercent,availableQuantity,
discountedSellingPrice,weightInGms,outOfStock,quantity)
FROM 'data/zepto_v2.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', ENCODING 'UTF8');
- Faced encoding issues (UTF-8 error), which were fixed by saving the CSV file using CSV UTF-8 format.
-
Counted the total number of records in the dataset
-
Viewed a sample of the dataset to understand structure and content
-
Checked for null values across all columns
-
Identified distinct product categories available in the dataset
-
Compared in-stock vs out-of-stock product counts
-
Detected products present multiple times, representing different SKUs
-
Identified and removed rows where MRP or discounted selling price was zero
-
Converted mrp and discountedSellingPrice from paise to rupees for consistency and readability
-
Found top 10 best-value products based on discount percentage
-
Identified high-MRP products that are currently out of stock
-
Estimated potential revenue for each product category
-
Filtered expensive products (MRP > ₹500) with minimal discount
-
Ranked top 5 categories offering highest average discounts
-
Calculated price per gram to identify value-for-money products
-
Grouped products based on weight into Low, Medium, and Bulk categories
-
Measured total inventory weight per product category
-
Clone the repository
git clone https://github.com/amlanmohanty/zepto-SQL-data-analysis-project.git cd zepto-SQL-data-analysis-project
-
Open zepto_SQL_data_analysis.sql
This file contains:
-
Table creation
-
Data exploration
-
Data cleaning
-
SQL Business analysis
-
-
Load the dataset into pgAdmin or any other PostgreSQL client
-
Create a database and run the SQL file
-
Import the dataset (convert to UTF-8 if necessary)
-
-
Follow along with the YouTube video for full walkthrough. 👨💼
MIT — feel free to fork, star, and use in your portfolio.
Hey, I’m Amlan Mohanty — a Data Analyst & Content Creator. I break down complex data topics into simple, practical content that actually helps you land a job.
If you enjoyed this project and want to keep learning and growing as a data analyst, let’s stay in touch! I regularly share content around SQL, data analytics, portfolio projects, job tips, and more.
🎥 YouTube: Data Drool
- Beginner-friendly tutorials, real-world projects, job and career advice
📺 Instagram: data.drool
- Quick SQL tips, data memes, and behind-the-scenes content
💼 LinkedIn: Amlan Mohanty
- Let’s connect professionally and grow your data career