In this Project, I have used multiple GCP services to perform ELT on Global Health data CSV file and loaded the file into GCS bucket, utilized Airflow deployed on VM instance and loaded the CSV file into Staging dataset table. Then transformed the data into multiple tranformed tables by splitting the raw data into multiple tables by country type. Created views by modifying the column names and filtering the data as per the requirements and populated view for each country table Used Looker, Connected to Big Query views and build India health data report and enabled publishing and email notifications and received email with report in the PDF format Resource: Vishal Bulbule resources has been used to understand and perform end to end data engineering project, Thanks Vishal
GCP Services used:
- GCS: Google Cloud Storage
- BQ: Big Query
- VM Instance: To Install Airflow and perform ELT
- Looker: For Reporting and Scheduling
GCS Bucket Details
Big Query Datasets and Tables Information
Complete Airflow GCS to BQ Tables and View DAG:
Python Script URL: https://github.com/ViinayKumaarMamidi/Airflow_GCP_Data_Engineering_Project/blob/main/Airflow_GCS_to_BQ_Tranformation_DAG_Script.py
Looker Report
Looker PDF Report URL: https://github.com/ViinayKumaarMamidi/Airflow_GCP_Data_Engineering_Project/blob/main/India_Health_Data_Report.pdf
Looker Report Subscriptions/Scheduling Scheduled the report to be sent daily at 4 PM EST