GitHub Repo: Data Engineering Zoomcamp (GitHub)
YouTube channel: Data Engineering Zoomcamp (YouTube)
Streamlit: DE Zoomcamp UI
Dataset: NYC TLC Data
1.1.1 - Introduction to Google Cloud Platform
1.2.1 - Introduction to Docker
1.2.2 - Ingesting NY Taxi Data to Postgres
1.2.3 - Connecting pgAdmin and Postgres
1.2.4 - Putting the ingestion script into Docker
1.2.5 - Running Postgres and pgAdmin with Docker-Compose
1.2.6 - SQL Refresher
1.3.1 - Introduction to Terraform Concepts & GCP Prerequisites
1.3.2 - Creating GCP Infrastructure with Terraform
2.1.1 - Data Lake (Google Cloud Storage)
2.2.1 - Intro to Workflow Orchestration
2.3.1 - Setup Airflow Environment with Docker Compose
2.3.2 - Ingesting Data to GCP with Airflow
2.3.3 - Ingesting Data to Local Postgres with Airflow
3.1.1 - Data Warehouse and BigQuery
3.1.2 - Partitioning and Clustering
3.2.1 - BigQuery Best Practices
3.2.2 - Internals of BigQuery
3.3.1 - BigQuery Machine Learning
3.3.2 - BigQuery Machine Learning Deployment
4.1.1 - Analytics Engineering Basics
4.1.2 - What is dbt
4.2.1 - Start your dbt Project - BigQuery and dbt Cloud(Alternative A)
4.2.2 - Start your dbt Project - Postgres and dbt Core Locally (Alternative B)
4.3.1 - Build the First dbt Models
4.3.2 - Testing and Documenting the Project
4.4.1 - Deployment Using dbt CLoud (Alternative A)
4.4.2 - Deployment Using dbt Locally (Alternative B)
4.5.1 - Visualising the data with Google Data Studio (Alternative A)
4.5.2 - Visualising the data with Metabase (Alternative B)
5.1.1 - Introduction to Batch processing
5.1.2 - Introduction to Spark
5.2.1 - (Optional) Installing Spark on Linux
5.3.1 - First Look at Spark/PySpark
5.3.2 - Spark DataFrames
5.3.3 - (Optional) Preparing Yellow and Green Taxi Data
5.3.4 - SQL with Spark
5.4.1 - Anatomy of a Spark Cluster
5.4.2 - GroupBy in Spark
5.4.3 - Joins in Spark
5.5.1 - (Optional) Operations on Spark RDDs
5.5.2 - (Optional) Spark RDD mapPartition
5.6.1 - Connecting to Google Cloud Storage
5.6.2 - Creating a Local Spark Cluster
5.6.3 - Setting up a Dataproc Cluster
5.6.4 - Connecting Spark to Big Query
6.2 - What is stream processing
6.3 - What is kafka?
6.4 - Confluent cloud
6.5 - Kafka producer consumer
6.6 - Kafka configuration
6.7 - Kafka streams basics
6.8 - Kafka stream join
6.9 - Kafka stream testing
6.10 - Kafka stream windowing
6.11 - Kafka ksqldb & Connect
6.12 - Kafka Schema registry