This project shows how ETL pipelines can be created with Airflow DAGs
The project makes use of four custom plugins:
- stage_redshift: to stage data on redshift,
- load_dimension: to load dimension data from redshit,
- load_fact: to load the fact table from redshift,
- data_quality: to validate and test the final tables for consistencies.
The pipelines are well parametarized for a greater flexibility. The ETL pipeline is set-up to run back-fill based on the start date of the schedule. Retries are also configured to ensure failures are retried.