Skip to content

This Repo contains details about Running Airflow on GCP Cloud VM Instance and Building end to end Data Engineering Project using multiple GCP services, Thanks

Notifications You must be signed in to change notification settings

ViinayKumaarMamidi/Airflow_GCP_GCS_to_BQ_Looker_Data_Engineering_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In this Project, I have used multiple GCP services to perform ELT on Global Health data CSV file and loaded the file into GCS bucket, utilized Airflow deployed on VM instance and loaded the CSV file into Staging dataset table. Then transformed the data into multiple tranformed tables by splitting the raw data into multiple tables by country type. Created views by modifying the column names and filtering the data as per the requirements and populated view for each country table Used Looker, Connected to Big Query views and build India health data report and enabled publishing and email notifications and received email with report in the PDF format Resource: Vishal Bulbule resources has been used to understand and perform end to end data engineering project, Thanks Vishal

GCP Services used:

  1. GCS: Google Cloud Storage
  2. BQ: Big Query
  3. VM Instance: To Install Airflow and perform ELT
  4. Looker: For Reporting and Scheduling

Data Flow Architecture image

GCS Bucket Details

Source_GCS_Bucket_1_Million_Records_CSV_File

Big Query Datasets and Tables Information

BigQuery_Tables_Views_Information

Complete Airflow GCS to BQ Tables and View DAG:

Python Script URL: https://github.com/ViinayKumaarMamidi/Airflow_GCP_Data_Engineering_Project/blob/main/Airflow_GCS_to_BQ_Tranformation_DAG_Script.py

Final_GCS_Bucket_To_BigQuery_Tables_to_BigQuery_Views_Airflow_DAG_Flow

Looker Report

Looker_Reporting_Web_UI

Looker PDF Report URL: https://github.com/ViinayKumaarMamidi/Airflow_GCP_Data_Engineering_Project/blob/main/India_Health_Data_Report.pdf

Looker Report Subscriptions/Scheduling Scheduled the report to be sent daily at 4 PM EST

Looker_Report_Email_Attachment

About

This Repo contains details about Running Airflow on GCP Cloud VM Instance and Building end to end Data Engineering Project using multiple GCP services, Thanks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages