A comprehensive data engineering platform for e-commerce including:
- π§© Synthetic data generation with Python Faker
- ποΈ ETL Pipeline orchestrated by Apache Airflow
- π¦ Structured Data Lake (Bronze/Silver/Gold)
- π Interactive dashboards with Apache Superset
- π Real-time monitoring via Grafana
Component | Technologies | Emoji |
---|---|---|
Orchestration | Apache Airflow, Docker | βοΈ |
Storage | MinIO, PostgreSQL, Parquet | πΎ |
Transformation | Python, Polars | π |
Visualization | Apache Superset | π |
Monitoring | Prometheus, Grafana, cAdvisor | π |
Relational structure optimized for transactions
Characteristic | Details |
---|---|
Type | Relational (PostgreSQL) |
Tables | - users - addresses - categories - products - orders - order_items - payments - shipments - reviews - product_views |
Indexes | - idx_orders_user_id - idx_orders_billing_address_id - idx_orders_shipping_address_id - idx_addresses_user_id - idx_order_items_order_id - idx_order_items_product_id - idx_payments_order_id - idx_shipments_order_id - idx_reviews_user_id - idx_reviews_product_id - idx_product_views_user_id - idx_product_views_product_id |
Optimization | Normalization, Referential integrity constraints |
Star schema for business analysis
Characteristic | Details |
---|---|
Type | Data Warehouse (PostgreSQL) |
Schema | Star Schema |
Tables | Fact_Sales, Fact_User_Activity, Fact_Product_Performance, Fact_Payment_Analytics, Dim_Products, Dim_Time, Dim_Geography, Dim_User, Dim_Payment_Method |
Indexes | - idx_fact_sales_time - idx_fact_sales_product - idx_fact_user_geo - idx_fact_payment_method - idx_geography_country - idx_geography_city - idx_product_category - idx_user_registration |
Synthetic data generation workflow with Python Faker
Complete data flow from source to dashboards
ETL task management with Apache Airflow
Step | Tools | Output |
---|---|---|
Extraction | Faker, PostgreSQL | ποΈ Bronze Layer (MinIO) |
Transformation | Polars, Python | π§Ή Silver Layer (Parquet) |
Loading | SQL, dbt | π Gold Layer (PostgreSQL) |
Real-time business KPIs with Apache Superset
Metric | Tool | Emoji |
---|---|---|
Sales | Superset | π |
Performance | Grafana | π |
Logs | Prometheus | π |
Container and metrics monitoring
Component | Function | Dashboard |
---|---|---|
cAdvisor | Docker Monitoring | ![]() |
Postgres-Exporter | PostgreSQL Metrics | ![]() |
StatsD-Exporter | Airflow Metrics | ![]() |
MinIo Server | MinIO Metrics | ![]() |
Component | Minimum | Recommended |
---|---|---|
CPU | 4 cores | 8 cores |
RAM | 8GB | 16GB |
Storage | 50GB SSD | 100GB NVMe |
git clone https://github.com/abrahamkoloboe27/E-Commerce-Data-Pipeline-And-Dashboard-With-Apache-Superset
cd e-commerce-pipeline
make build # Build Docker images
make up # Start containers
make build-up # Build and start containers
make down # Stop and remove containers
make down-volumes # Remove containers and volumes
make down-volumes-build-up # Remove containers, volumes, and build new images
Service | URL | Credentials | Port |
---|---|---|---|
Airflow | http://localhost:8080 | admin/admin | 8080 |
MinIO | http://localhost:9001 | minioadmin/minioadmin | 9001 |
Superset | http://localhost:8088 | admin/admin | 8088 |
Grafana | http://localhost:3000 | grafana/grafana | 3000 |
Feature | Technology | Benefit |
---|---|---|
Hierarchical Data Lake | MinIO + Parquet | π·οΈ Raw/transformed data structuring |
Modular ETL | Airflow + Python | π Workflow reproducibility |
Unified Monitoring | Grafana + Prometheus | π 360Β° performance view |
π License: MIT
π§ Contact: abklb27@gmail.com
π¨π» Author: Abraham Koloboe
β¬ Back to top
β¨ Made with passion for data engineering!