Skip to content

All materials/coursework related to Data Engineering Zoomcamp. The course covers various DE tools including Docker, GCP, Terraform, Airflow, dbt, Spark, Kafka, etc.

Notifications You must be signed in to change notification settings

vinzalfaro/data-engineering-zoomcamp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Zoomcamp

GitHub Repo: Data Engineering Zoomcamp (GitHub)
YouTube channel: Data Engineering Zoomcamp (YouTube)
Streamlit: DE Zoomcamp UI
Dataset: NYC TLC Data

Content

Week 1 - Intro & Prerequisites

1.1.1 - Introduction to Google Cloud Platform
1.2.1 - Introduction to Docker
1.2.2 - Ingesting NY Taxi Data to Postgres
1.2.3 - Connecting pgAdmin and Postgres
1.2.4 - Putting the ingestion script into Docker
1.2.5 - Running Postgres and pgAdmin with Docker-Compose
1.2.6 - SQL Refresher
1.3.1 - Introduction to Terraform Concepts & GCP Prerequisites
1.3.2 - Creating GCP Infrastructure with Terraform

Week 2 - Workflow Orchestration

2.1.1 - Data Lake (Google Cloud Storage)
2.2.1 - Intro to Workflow Orchestration
2.3.1 - Setup Airflow Environment with Docker Compose
2.3.2 - Ingesting Data to GCP with Airflow
2.3.3 - Ingesting Data to Local Postgres with Airflow

Week 3 - Data Warehouse

3.1.1 - Data Warehouse and BigQuery
3.1.2 - Partitioning and Clustering
3.2.1 - BigQuery Best Practices
3.2.2 - Internals of BigQuery
3.3.1 - BigQuery Machine Learning
3.3.2 - BigQuery Machine Learning Deployment

Week 4 - Analytics Engineering

4.1.1 - Analytics Engineering Basics
4.1.2 - What is dbt
4.2.1 - Start your dbt Project - BigQuery and dbt Cloud(Alternative A)
4.2.2 - Start your dbt Project - Postgres and dbt Core Locally (Alternative B)
4.3.1 - Build the First dbt Models
4.3.2 - Testing and Documenting the Project
4.4.1 - Deployment Using dbt CLoud (Alternative A)
4.4.2 - Deployment Using dbt Locally (Alternative B)
4.5.1 - Visualising the data with Google Data Studio (Alternative A)
4.5.2 - Visualising the data with Metabase (Alternative B)

Week 5 - Batch Processing

5.1.1 - Introduction to Batch processing
5.1.2 - Introduction to Spark
5.2.1 - (Optional) Installing Spark on Linux
5.3.1 - First Look at Spark/PySpark
5.3.2 - Spark DataFrames
5.3.3 - (Optional) Preparing Yellow and Green Taxi Data
5.3.4 - SQL with Spark
5.4.1 - Anatomy of a Spark Cluster
5.4.2 - GroupBy in Spark
5.4.3 - Joins in Spark
5.5.1 - (Optional) Operations on Spark RDDs
5.5.2 - (Optional) Spark RDD mapPartition
5.6.1 - Connecting to Google Cloud Storage
5.6.2 - Creating a Local Spark Cluster
5.6.3 - Setting up a Dataproc Cluster
5.6.4 - Connecting Spark to Big Query

Week 6 - Stream Processing

6.2 - What is stream processing
6.3 - What is kafka?
6.4 - Confluent cloud
6.5 - Kafka producer consumer
6.6 - Kafka configuration
6.7 - Kafka streams basics
6.8 - Kafka stream join
6.9 - Kafka stream testing
6.10 - Kafka stream windowing
6.11 - Kafka ksqldb & Connect
6.12 - Kafka Schema registry

About

All materials/coursework related to Data Engineering Zoomcamp. The course covers various DE tools including Docker, GCP, Terraform, Airflow, dbt, Spark, Kafka, etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published