Skip to content

abrahamkoloboe27/E-Commerce-Data-Pipeline-And-Dashboard-With-Apache-Superset

Repository files navigation

E-commerce Data Pipeline & Analytics Dashboard πŸš€πŸ“Š

GitHub stars License Docker Docker Compose PostgreSQL MinIO Apache Superset Apache Parquet Python Prometheus StatsD Grafana Apache Airflow

🌟 Project Overview

A comprehensive data engineering platform for e-commerce including:

  • 🧩 Synthetic data generation with Python Faker
  • πŸ—οΈ ETL Pipeline orchestrated by Apache Airflow
  • πŸ“¦ Structured Data Lake (Bronze/Silver/Gold)
  • πŸ“Š Interactive dashboards with Apache Superset
  • πŸ” Real-time monitoring via Grafana

πŸ—οΈ Global Architecture

Global Architecture

πŸ“Š Key Components

Component Technologies Emoji
Orchestration Apache Airflow, Docker βš™οΈ
Storage MinIO, PostgreSQL, Parquet πŸ’Ύ
Transformation Python, Polars πŸ”„
Visualization Apache Superset πŸ“ˆ
Monitoring Prometheus, Grafana, cAdvisor πŸ“Š

πŸ—ƒοΈ Database Schemas

πŸ›’ Production Database (OLTP)

OLTP Schema

Relational structure optimized for transactions

Characteristic Details
Type Relational (PostgreSQL)
Tables - users
- addresses
- categories
- products
- orders
- order_items
- payments
- shipments
- reviews
- product_views
Indexes - idx_orders_user_id
- idx_orders_billing_address_id
- idx_orders_shipping_address_id
- idx_addresses_user_id
- idx_order_items_order_id
- idx_order_items_product_id
- idx_payments_order_id
- idx_shipments_order_id
- idx_reviews_user_id
- idx_reviews_product_id
- idx_product_views_user_id
- idx_product_views_product_id
Optimization Normalization, Referential integrity constraints

πŸ“ˆ Analytics Database (OLAP)

OLAP Schema
Star schema for business analysis

Characteristic Details
Type Data Warehouse (PostgreSQL)
Schema Star Schema
Tables Fact_Sales, Fact_User_Activity, Fact_Product_Performance, Fact_Payment_Analytics, Dim_Products, Dim_Time, Dim_Geography, Dim_User, Dim_Payment_Method
Indexes - idx_fact_sales_time
- idx_fact_sales_product
- idx_fact_user_geo
- idx_fact_payment_method
- idx_geography_country
- idx_geography_city
- idx_product_category
- idx_user_registration

πŸ› οΈ Pipeline Components

πŸ“¦ Data Generation

Data Generator Architecture

Synthetic data generation workflow with Python Faker

πŸ”„ Orchestration Workflow

πŸ”„ Data Flow Overview

Complete data flow from source to dashboards

Airflow DAG

ETL task management with Apache Airflow

Step Tools Output
Extraction Faker, PostgreSQL πŸ—ƒοΈ Bronze Layer (MinIO)
Transformation Polars, Python 🧹 Silver Layer (Parquet)
Loading SQL, dbt πŸ† Gold Layer (PostgreSQL)

πŸ“Š Monitoring & Visualization

πŸ–₯️ Operational Dashboarding

Superset Dashboard

Real-time business KPIs with Apache Superset

Metric Tool Emoji
Sales Superset πŸ“ˆ
Performance Grafana πŸ“‰
Logs Prometheus πŸ“‹

πŸ” Monitoring Stack

Monitoring Stack

Container and metrics monitoring

Component Function Dashboard
cAdvisor Docker Monitoring Grafana cAdvisor
Postgres-Exporter PostgreSQL Metrics Grafana Postgres
StatsD-Exporter Airflow Metrics Grafana Airflow
MinIo Server MinIO Metrics Grafana MinIO

πŸš€ Quick Start

πŸ“‹ Prerequisites

Component Minimum Recommended
CPU 4 cores 8 cores
RAM 8GB 16GB
Storage 50GB SSD 100GB NVMe
git clone https://github.com/abrahamkoloboe27/E-Commerce-Data-Pipeline-And-Dashboard-With-Apache-Superset
cd e-commerce-pipeline
make build  # Build Docker images
make up  # Start containers
make build-up  # Build and start containers
make down  # Stop and remove containers
make down-volumes  # Remove containers and volumes
make down-volumes-build-up  # Remove containers, volumes, and build new images

πŸ”— Service Access

Service URL Credentials Port
Airflow http://localhost:8080 admin/admin 8080
MinIO http://localhost:9001 minioadmin/minioadmin 9001
Superset http://localhost:8088 admin/admin 8088
Grafana http://localhost:3000 grafana/grafana 3000

πŸ“Œ Features Highlights

Feature Technology Benefit
Hierarchical Data Lake MinIO + Parquet 🏷️ Raw/transformed data structuring
Modular ETL Airflow + Python πŸ”„ Workflow reproducibility
Unified Monitoring Grafana + Prometheus πŸ“Š 360Β° performance view

πŸ“œ License & Contact

πŸ“„ License: MIT
πŸ“§ Contact: abklb27@gmail.com
πŸ‘¨πŸ’» Author: Abraham Koloboe

⬆ Back to top
✨ Made with passion for data engineering!