Azure-Data-Engineering-Project using Netflix dataset

An end-to-end data engineering pipeline on Microsoft Azure leveraging the publicly available Netflix dataset. This project covers:

Data Ingestion (Bronze)
Data Processing & Cleaning (Silver)
Data Quality & Delivery (Gold)
Automation & Orchestration

Medallion Layers:

Layer	Purpose
Bronze	Ingest raw data (Autoloader & ADF) into Delta format
Silver	Clean, dedupe, enrich; enforce schemas with PySpark
Gold	Apply Delta Live Tables for quality checks & aggregations

PROJECT ARCHITECTURE

Phase 1: Bronze (Raw Ingestion)

Sources
- Netflix_titles.csv in ADLS Gen2 (rawdata/Netflix_titles.csv)
- Lookup tables (directors, cast, categories, countries) from github
Orchestration
- Azure Data Factory pipelines using Copy Data, ForEach, validation and If Condition activities
- Parameterized datasets & pipelines for reusability
Autoload
- Incremental ingest of new CSV files into bronze.netflix_titles_deltausing Databricks Autoloader
Storage
- All raw ingestions stored as Delta tables in the bronze/ container

Phase 2: Silver (Cleansing & Enrichment)

Compute

Azure Databricks PySpark notebooks
Transformations
- Split multi-valued columns (e.g., rating)
- Remove duplicates, filter invalid records
- filling of null values
- Cast of data types for analytics readiness
Orchestration
- Databricks Workflows chaining parameterized notebooks
Output
- Cleaned Delta tables in the silver/ container

Phase 3: Gold (Quality & Aggregation)

Framework
- Delta Live Tables (DLT) for declarative pipelines
Data Quality
- Define Expectations (e.g., NOT NULL, UNIQUE etc.)
- Configure actions: drop

Technology Stack

Component	Purpose
Azure Data Factory (ADF)	Data orchestration & ingestion
Azure Data Lake Storage	Scalable storage for Delta tables
Azure Databricks	Spark-based ETL & Delta Live Tables
Delta Lake	ACID-compliant, performant data format
Python / PySpark	Data transformation logic

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
images		images
netflix_data		netflix_data
Netflix_Project.dbc		Netflix_Project.dbc
Project Archictecture.png		Project Archictecture.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Azure-Data-Engineering-Project using Netflix dataset

PROJECT ARCHITECTURE

Phase 1: Bronze (Raw Ingestion)

Phase 2: Silver (Cleansing & Enrichment)

Phase 3: Gold (Quality & Aggregation)

Technology Stack

About

Uh oh!

Releases

Packages

jotstolu/Azure-Data-Engineering-End--to-End-Project

Folders and files

Latest commit

History

Repository files navigation

Azure-Data-Engineering-Project using Netflix dataset

PROJECT ARCHITECTURE

Phase 1: Bronze (Raw Ingestion)

Phase 2: Silver (Cleansing & Enrichment)

Phase 3: Gold (Quality & Aggregation)

Technology Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages