Skip to content

End-to-end Data Engineering Capstone Project using MySQL, ๐ŸƒMongoDB, ๐Ÿ˜PostgreSQL, ๐Ÿ’จApache Airflow, โšก๏ธApache Spark, and BI dashboards ๐Ÿ“Š๐Ÿš€

License

Notifications You must be signed in to change notification settings

Willie-Conway/IBM-Data-Engineering-Capstone-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

20 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ—๏ธ IBM Data Engineering Capstone Project

This capstone project showcases the practical application of key data engineering skills by simulating a real-world scenario in which I served as a Junior Data Engineer. I designed and implemented a scalable data analytics platform by working across various technologies in the data engineering lifecycle.


๐Ÿš€ Project Overview

This capstone project simulates the role of a Junior Data Engineer tasked with designing and implementing an end-to-end data analytics platform using multiple data engineering tools and technologies.
Itโ€™s the final course in the IBM Data Engineering Professional Certificate, combining all prior learning into one practical project.


๐Ÿง  What I Learned

โœ… Design and build data platforms using OLTP & OLAP architectures
โœ… Implement data pipelines with ETL processes using Python and Apache Airflow
โœ… Query structured and unstructured data using MySQL, PostgreSQL, and MongoDB
โœ… Perform big data analytics and ML predictions using Apache Spark
โœ… Visualize insights via dashboards in Google Looker Studio and IBM Cognos Analytics


๐Ÿงฐ Skills & Tools Used

  • ๐Ÿ Python & SQL
  • ๐Ÿ˜ PostgreSQL | ๐Ÿฌ MySQL | ๐Ÿƒ MongoDB
  • ๐Ÿ› ๏ธ Apache Airflow
  • ๐Ÿ” Apache Spark (MLlib)
  • ๐Ÿ“Š IBM Cognos Analytics | Google Looker Studio
  • ๐Ÿ—ƒ๏ธ OLTP & Data Warehousing
  • ๐Ÿงฑ ETL & Data Pipelines
  • ๐Ÿง Linux Shell Scripting
  • ๐Ÿ“‚ JSON, CSV, .tar.gz, and data transformations

๐Ÿ“ฆ Modules Breakdown

Module Description
๐Ÿ“ 1. Data Platform Architecture & OLTP Designed OLTP schemas & created MySQL databases
๐Ÿƒ 2. NoSQL with MongoDB Queried JSON documents and used MongoDB indexes
๐Ÿ—„๏ธ 3. Data Warehouse Built dimensional models & populated warehouse tables
๐Ÿ“ˆ 4. Data Analytics & Reporting Wrote complex SQL queries with ROLLUP, CUBE, and aggregations
๐Ÿ” 5. ETL & Pipelines Built ETL flows with Python scripts and Apache Airflow DAGs
โšก 6. Big Data Analytics with Spark Trained and deployed ML models using Spark MLlib
โœ… 7. Final Submission Delivered final reports, dashboards, and peer-reviewed projects

๐Ÿ“Š Dashboard Samples

Tool Preview
Google Looker Studio Looker Dashboard
IBM Cognos Analytics Cognos Dashboard

๐Ÿ“‚ Project Assets


๐Ÿ“ OLTP Database Design
๐Ÿ“ NoSQL Queries & Exports
๐Ÿ“ Data Warehouse Scripts & CSVs
๐Ÿ“ Airflow DAGs & Python Scripts
๐Ÿ“ SparkML Model & Predictions
๐Ÿ“ Dashboards (Google Looker, Cognos)

๐Ÿ“Œ Key Skills Demonstrated

  • ๐Ÿ—ƒ๏ธ Relational & NoSQL Database Design (MySQL, MongoDB)
  • ๐Ÿ—๏ธ Data Warehouse Modeling and Querying (PostgreSQL, IBM Db2)
  • ๐Ÿ”„ ETL Pipeline Development (Python, Shell, Apache Airflow)
  • ๐Ÿ”ฅ Big Data Analytics with Apache Spark
  • ๐Ÿ“Š Data Visualization (Google Looker Studio, IBM Cognos Analytics)
  • ๐Ÿง Linux Shell Scripting
  • ๐Ÿงช SQL queries using ROLLUP, CUBE, GROUPING SETS, and Materialized Query Tables (MQTs)

๐Ÿงช Capstone Modules & Labs Overview

๐Ÿ“ Module 1: Data Platform Architecture & OLTP

  • Designed an OLTP schema and created MySQL tables.
  • Imported and exported data using SQL and shell scripts.
  • Defined primary keys and indexes for optimized access.

๐Ÿƒ Module 2: Querying Data in NoSQL (MongoDB)

  • Loaded product catalog data into MongoDB.
  • Performed filter queries and aggregation pipelines.
  • Exported collections using mongoexport.

๐Ÿ—๏ธ Module 3: Building a Data Warehouse

  • Created star schema with dimensions and fact tables in PostgreSQL.
  • Imported e-commerce sales data.
  • Performed OLAP queries with CUBE, ROLLUP, and GROUPING SETS.

๐Ÿ“ˆ Module 4: Data Analytics

  • Wrote analytical SQL queries to uncover trends in sales data.
  • Used Materialized Query Tables to improve performance.

๐Ÿ” Module 5: ETL & Data Pipelines

  • Wrote Python scripts for extract, transform, and load processes.
  • Automated the pipeline using Apache Airflow DAGs.
  • Processed and cleaned web logs into structured format.

โšก Module 6: Big Data Analytics with Apache Spark

  • Used Spark to load and transform product review data.
  • Built a machine learning model using Spark MLlib.
  • Saved and reloaded the trained model for prediction tasks.

๐Ÿ“Š Module 7: Dashboards & Final Submission

  • Built sales dashboards using:
    • Google Looker Studio: Interactive charts, filters, KPIs.
    • IBM Cognos Analytics: Custom visualizations and report generation.
  • Submitted final project artifacts for peer review.

๐Ÿง  Summary

This project helped solidify my knowledge of:

  • Building data infrastructure from ground up
  • Managing both structured and semi-structured data
  • Automating and scaling data workflows
  • Communicating data insights through visual tools

๐Ÿ Outcome

โœ… Proficiency in end-to-end data engineering workflows
โœ… Prepared for real-world junior-level data engineering roles


๐Ÿง  Reflections

This project was a culmination of weeks of learning and hands-on practice. I strengthened my data engineering foundations and became confident in building real-world data solutions end-to-end. ๐Ÿงฉ๐Ÿ’ก


๐Ÿ’ผ Ideal For

  • Hiring managers evaluating full-stack data engineers
  • Recruiters seeking professionals skilled in data architecture, pipelines, and analytics
  • Anyone interested in practical data engineering workflows

๐Ÿ”— Related Projects


๐Ÿ Let's Connect!

If you're interested in my other data projects or collaborations:
๐ŸŒ My Portfolio | ๐Ÿ’ผ LinkedIn | ๐Ÿ“‚ GitHub Projects

About

End-to-end Data Engineering Capstone Project using MySQL, ๐ŸƒMongoDB, ๐Ÿ˜PostgreSQL, ๐Ÿ’จApache Airflow, โšก๏ธApache Spark, and BI dashboards ๐Ÿ“Š๐Ÿš€

Topics

Resources

License

Stars

Watchers

Forks