This capstone project showcases the practical application of key data engineering skills by simulating a real-world scenario in which I served as a Junior Data Engineer. I designed and implemented a scalable data analytics platform by working across various technologies in the data engineering lifecycle.
This capstone project simulates the role of a Junior Data Engineer tasked with designing and implementing an end-to-end data analytics platform using multiple data engineering tools and technologies.
Itโs the final course in the IBM Data Engineering Professional Certificate, combining all prior learning into one practical project.
โ
Design and build data platforms using OLTP & OLAP architectures
โ
Implement data pipelines with ETL processes using Python and Apache Airflow
โ
Query structured and unstructured data using MySQL, PostgreSQL, and MongoDB
โ
Perform big data analytics and ML predictions using Apache Spark
โ
Visualize insights via dashboards in Google Looker Studio and IBM Cognos Analytics
- ๐ Python & SQL
- ๐ PostgreSQL | ๐ฌ MySQL | ๐ MongoDB
- ๐ ๏ธ Apache Airflow
- ๐ Apache Spark (MLlib)
- ๐ IBM Cognos Analytics | Google Looker Studio
- ๐๏ธ OLTP & Data Warehousing
- ๐งฑ ETL & Data Pipelines
- ๐ง Linux Shell Scripting
- ๐ JSON, CSV, .tar.gz, and data transformations
Module | Description |
---|---|
๐ 1. Data Platform Architecture & OLTP | Designed OLTP schemas & created MySQL databases |
๐ 2. NoSQL with MongoDB | Queried JSON documents and used MongoDB indexes |
๐๏ธ 3. Data Warehouse | Built dimensional models & populated warehouse tables |
๐ 4. Data Analytics & Reporting | Wrote complex SQL queries with ROLLUP , CUBE , and aggregations |
๐ 5. ETL & Pipelines | Built ETL flows with Python scripts and Apache Airflow DAGs |
โก 6. Big Data Analytics with Spark | Trained and deployed ML models using Spark MLlib |
โ 7. Final Submission | Delivered final reports, dashboards, and peer-reviewed projects |
Tool | Preview |
---|---|
Google Looker Studio | |
IBM Cognos Analytics | ![]() |
๐ OLTP Database Design
๐ NoSQL Queries & Exports
๐ Data Warehouse Scripts & CSVs
๐ Airflow DAGs & Python Scripts
๐ SparkML Model & Predictions
๐ Dashboards (Google Looker, Cognos)
- ๐๏ธ Relational & NoSQL Database Design (MySQL, MongoDB)
- ๐๏ธ Data Warehouse Modeling and Querying (PostgreSQL, IBM Db2)
- ๐ ETL Pipeline Development (Python, Shell, Apache Airflow)
- ๐ฅ Big Data Analytics with Apache Spark
- ๐ Data Visualization (Google Looker Studio, IBM Cognos Analytics)
- ๐ง Linux Shell Scripting
- ๐งช SQL queries using
ROLLUP
,CUBE
,GROUPING SETS
, and Materialized Query Tables (MQTs)
- Designed an OLTP schema and created MySQL tables.
- Imported and exported data using SQL and shell scripts.
- Defined primary keys and indexes for optimized access.
- Loaded product catalog data into MongoDB.
- Performed filter queries and aggregation pipelines.
- Exported collections using
mongoexport
.
- Created star schema with dimensions and fact tables in PostgreSQL.
- Imported e-commerce sales data.
- Performed OLAP queries with
CUBE
,ROLLUP
, andGROUPING SETS
.
- Wrote analytical SQL queries to uncover trends in sales data.
- Used Materialized Query Tables to improve performance.
- Wrote Python scripts for extract, transform, and load processes.
- Automated the pipeline using Apache Airflow DAGs.
- Processed and cleaned web logs into structured format.
- Used Spark to load and transform product review data.
- Built a machine learning model using Spark MLlib.
- Saved and reloaded the trained model for prediction tasks.
- Built sales dashboards using:
- Google Looker Studio: Interactive charts, filters, KPIs.
- IBM Cognos Analytics: Custom visualizations and report generation.
- Submitted final project artifacts for peer review.
This project helped solidify my knowledge of:
- Building data infrastructure from ground up
- Managing both structured and semi-structured data
- Automating and scaling data workflows
- Communicating data insights through visual tools
โ
Proficiency in end-to-end data engineering workflows
โ
Prepared for real-world junior-level data engineering roles
This project was a culmination of weeks of learning and hands-on practice. I strengthened my data engineering foundations and became confident in building real-world data solutions end-to-end. ๐งฉ๐ก
- Hiring managers evaluating full-stack data engineers
- Recruiters seeking professionals skilled in data architecture, pipelines, and analytics
- Anyone interested in practical data engineering workflows
- ETL Pipeline Automation using Airflow
- MongoDB NoSQL Data Exploration
- Apache Spark ML Prediction Models
If you're interested in my other data projects or collaborations:
๐ My Portfolio | ๐ผ LinkedIn | ๐ GitHub Projects