Welcome to the Data Engineering & Analytics Project repository!
This project showcases an end-to-end modern data pipeline—from raw data ingestion to structured data modeling and analytics—leveraging the power of Snowflake as a cloud data warehouse and dbt (Data Build Tool) for transformation and modeling. It’s designed to demonstrate industry-grade practices in data engineering, warehouse architecture, and analytics modeling.
The data warehouse is designed using the Star Schema modeling technique and follows a layered architecture inspired by dbt best practices:
📦 Staging Layer → 🧹 Transformation Layer → 📊 Mart Layer (Star Schema)
- Staging Layer: Contains raw ingested tables. Acts as the foundation for all transformations. Mirrors the original datasets.
- Transformation Layer: Cleansed, standardized, and enriched data prepared for analysis.
- Mart Layer: Final analytical models—fact and dimension tables—optimized for reporting and business intelligence.
This architecture ensures modularity, scalability, and clear lineage from raw to analytics-ready data.
- Data Ingestion: Raw datasets are uploaded into Snowflake using secure stage ingestion or manual upload.
- Data Modeling: Using dbt, models are organized into:
- staging: Raw tables cleaned and aliased.
- transform: Business logic applied (joins, filters, type conversions).
- mart: Star schema with fact and dimension tables.
- Packages Used:
- Analytics: SQL-based reports built on top of fact tables for insights such as customer behavior, sales performance, and product analytics.
Category | Tools / Tech |
---|---|
Data Warehouse | Snowflake |
Modeling & ETL | dbt |
Orchestration | Manual / dbt Cloud (optional) |
Version Control | Git |
Diagramming | Draw.io |
data-pipeline-snowflake/
│
├── models/
│ ├── staging/ # Raw data cleaned and renamed
│ ├── transform/ # Business logic transformations
│ └── mart/ # Final star schema (fact + dimension)
│
├── data/ # Raw data (CSV files)
│
├── docs/ # Project diagrams and metadata
│ ├── schema_design.drawio
│ ├── dbt_data_flow.png
│ └── data_catalog.md
│
├── snapshots/ # Optional snapshots for slowly changing dimensions
│
├── tests/ # dbt tests (e.g., uniqueness, not_null)
│
├── dbt_project.yml # Main dbt project config
├── packages.yml # dbt packages config
├── README.md # Project overview
└── .gitignore
Some example business questions answered with the transformed data:
- Which products contribute the most to total sales?
- What is the average purchase frequency by customer segments?
- Which regions show declining trends in sales over time?
-
Clone the repository:
git clone https://github.com/davshegs/dbt_analytics.git
-
Install dbt dependencies:
dbt deps
-
Configure your
profiles
to connect dbt with your Snowflake instance. -
Run the pipeline:
dbt run
-
(Optional) Generate documentation:
dbt docs generate && dbt docs serve
This project is ideal for those aiming to build hands-on expertise in:
- Cloud Data Warehousing (Snowflake)
- Modern Data Stack (dbt, modular SQL)
- Data Modeling (Star Schema, Fact-Dimension Tables)
- Scalable ETL Pipelines
- Analytical Thinking & SQL Optimization
Hi, we're Shegun David and Immaculate Okoro, a passionate duo of data professionals with a shared interest in building reliable, scalable, and insightful data solutions.
We collaborated on this project to showcase the practical application of data engineering principles using modern tools like Snowflake and dbt. Our goal is to demonstrate how clean architecture, thoughtful modeling, and modular transformation can deliver real business value.
Let’s connect:
- Shegun David – Lead Data Engineer, dbt Implementation, Architecture Design
- Immaculate Okoro – Data Pipeline Collaborator, Testing, and Validation
This was a joint learning and development effort and reflects a shared commitment to clean data and modern engineering practices.
This project is licensed under the MIT License. Free to use, share, and adapt with proper credit.