Data Engineering Zoomcamp

GitHub Repo: Data Engineering Zoomcamp (GitHub)
YouTube channel: Data Engineering Zoomcamp (YouTube)
Streamlit: DE Zoomcamp UI
Dataset: NYC TLC Data

Content

Week 1 - Intro & Prerequisites

1.1.1 - Introduction to Google Cloud Platform
1.2.1 - Introduction to Docker
1.2.2 - Ingesting NY Taxi Data to Postgres
1.2.3 - Connecting pgAdmin and Postgres
1.2.4 - Putting the ingestion script into Docker
1.2.5 - Running Postgres and pgAdmin with Docker-Compose
1.2.6 - SQL Refresher
1.3.1 - Introduction to Terraform Concepts & GCP Prerequisites
1.3.2 - Creating GCP Infrastructure with Terraform

Week 2 - Workflow Orchestration

2.1.1 - Data Lake (Google Cloud Storage)
2.2.1 - Intro to Workflow Orchestration
2.3.1 - Setup Airflow Environment with Docker Compose
2.3.2 - Ingesting Data to GCP with Airflow
2.3.3 - Ingesting Data to Local Postgres with Airflow

Week 3 - Data Warehouse

3.1.1 - Data Warehouse and BigQuery
3.1.2 - Partitioning and Clustering
3.2.1 - BigQuery Best Practices
3.2.2 - Internals of BigQuery
3.3.1 - BigQuery Machine Learning
3.3.2 - BigQuery Machine Learning Deployment

Week 4 - Analytics Engineering

4.1.1 - Analytics Engineering Basics
4.1.2 - What is dbt
4.2.1 - Start your dbt Project - BigQuery and dbt Cloud(Alternative A)
4.2.2 - Start your dbt Project - Postgres and dbt Core Locally (Alternative B)
4.3.1 - Build the First dbt Models
4.3.2 - Testing and Documenting the Project
4.4.1 - Deployment Using dbt CLoud (Alternative A)
4.4.2 - Deployment Using dbt Locally (Alternative B)
4.5.1 - Visualising the data with Google Data Studio (Alternative A)
4.5.2 - Visualising the data with Metabase (Alternative B)

Week 5 - Batch Processing

5.1.1 - Introduction to Batch processing
5.1.2 - Introduction to Spark
5.2.1 - (Optional) Installing Spark on Linux
5.3.1 - First Look at Spark/PySpark
5.3.2 - Spark DataFrames
5.3.3 - (Optional) Preparing Yellow and Green Taxi Data
5.3.4 - SQL with Spark
5.4.1 - Anatomy of a Spark Cluster
5.4.2 - GroupBy in Spark
5.4.3 - Joins in Spark
5.5.1 - (Optional) Operations on Spark RDDs
5.5.2 - (Optional) Spark RDD mapPartition
5.6.1 - Connecting to Google Cloud Storage
5.6.2 - Creating a Local Spark Cluster
5.6.3 - Setting up a Dataproc Cluster
5.6.4 - Connecting Spark to Big Query

Week 6 - Stream Processing

6.2 - What is stream processing
6.3 - What is kafka?
6.4 - Confluent cloud
6.5 - Kafka producer consumer
6.6 - Kafka configuration
6.7 - Kafka streams basics
6.8 - Kafka stream join
6.9 - Kafka stream testing
6.10 - Kafka stream windowing
6.11 - Kafka ksqldb & Connect
6.12 - Kafka Schema registry

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
1_intro		1_intro
2_workflow_orchestration/airflow		2_workflow_orchestration/airflow
3_data_warehouse		3_data_warehouse
notes		notes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Engineering Zoomcamp

Content

Week 1 - Intro & Prerequisites

Week 2 - Workflow Orchestration

Week 3 - Data Warehouse

Week 4 - Analytics Engineering

Week 5 - Batch Processing

Week 6 - Stream Processing

About

Uh oh!

Releases

Packages

Languages

vinzalfaro/data-engineering-zoomcamp

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Zoomcamp

Content

Week 1 - Intro & Prerequisites

Week 2 - Workflow Orchestration

Week 3 - Data Warehouse

Week 4 - Analytics Engineering

Week 5 - Batch Processing

Week 6 - Stream Processing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages