Marvel ELT Data Pipeline

Tables of Contents

Overview
Data Visualization
Data Architecture
Prerequisites
How to Run This Project
Lessons Learned
Contact

Overview

This project is an ELT data pipeline using modern data stacks. It retrieves data from the Marvel API, ingests it into AWS RDS Postgres using a Python script, which is automated by Prefect's workflow orchestration tool, transforms data with dbt to build a dimensional model, and loads it into a PostgreSQL database for analytics. Additionally, it employs GitHub Action CI/CD to create a workflow that builds models each time changes are made and pushed to the main branch of the Git repository.

Data Visualization

Conceptual Model

Logical Model

dbt Dag

Data Architecture

I chose the following tools for this project because they are a good combination for learning about ELT, workflow orchestration, as well as CI/CD, and they also collectively address data extraction, automation, transformation, modeling, and deployment needs for a data pipeline project.

Architecture and Tools Summary:

Data Source: Marvel API for data source.
Data Ingestion: Python script for extracting and loading data into AWS RDS PostgreSQL.
Database: AWS RDS PostgreSQL for structured data storage.
Workflow Orchestration: Prefect's Workflow Orchestration tool for automation, scheduling, and error handling.
Data Transformation: dbt (Data Build Tool) for transforming and modeling data.
Data Modeling: Creation of a dimensional model for analytics.
CI/CD: GitHub Action CI/CD to automate testing and deployment of data models on changes to the main branch.
Data Analytics and Visualization: Qlik Sense for creating interactive dashboards, data exploration, and reporting.

Prerequisites

Marvel Studio API Keys
Python
Prefect CLI
AWS RDS Account
dbt core
GitHub Action
Docker
Qlik Sense Account

How to Run This Project

In order to run this project step-by-step you need to install the following packages:

Run ```pip install pipenv`` to create a virtual environment
Install the dependencies by running pipenv install -r requirements.txt
Enter your Marvel Studio API secrets credentials in .env file
Run the extract_load.py file to start the EL(Extract & Load phase) with Prefect. Please refer to this tutorial on how to configure Prefect.
Install dbt core on your local machine. Please refer to this tutorial install and run dbt core.
Implement the GitHub Action by creating the .github/workflows dir in the main branch and add the workflow.yml file to the .github/workflows dir
Connect the Qlik Sense to the AWS RDS and create your visualization.

Lessons Learned

One of the key takeaways from this project is the importance of scalability and flexibility in designing data pipelines. By utilizing cloud-based solutions like AWS RDS PostgreSQL, the project ensures scalability to handle varying data volumes. Additionally, the ability to adapt to changing requirements, data sources, and business needs is crucial. The choice of modular tools and technologies facilitates easy modifications and enhancements to the pipeline, allowing it to evolve alongside the organization's requirements.

Contact

You can reach me on LinkedIn to learn more about this project, and I'm open to collaboration.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
dbt		dbt
images		images
prefect		prefect
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Marvel ELT Data Pipeline

Tables of Contents

Overview

Data Visualization

Conceptual Model

Logical Model

dbt Dag

Data Architecture

Architecture and Tools Summary:

Prerequisites

How to Run This Project

Lessons Learned

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ukokobili/marvel_elt_pipeline

Folders and files

Latest commit

History

Repository files navigation

Marvel ELT Data Pipeline

Tables of Contents

Overview

Data Visualization

Conceptual Model

Logical Model

dbt Dag

Data Architecture

Architecture and Tools Summary:

Prerequisites

How to Run This Project

Lessons Learned

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages