ELT Pipeline in Datalake

ELT - Extract Load Transform pipeline ingestion from Database into a single datalake location to do transform processing.

Diagram

Introduction & Goals

Ingest Data from RDBMS to a Data lake bucket.
Process transaction files to a single location
- Data set of customer list of sales person
- We use AWS Glue
- Create Data Catalog, transform, load to Data lake
- Data will be transform from CSV to Parquet file
- Parquet stores the file schema in the file metadata. It is easier to work with because they are supported by so many different projects.
- Transformed data will be ready for query in Data Warehouse

The Data Set

We use transaction data set customer.csv.
Data set of sales person's customer list
appropiate size 197 KB for Glue ETL demo

Used Tools

Amazon RDS postgreSQL, AWS Glue, Glue crawler, Athena
- Load .csv file RDS into input folder inside a bucket.
- Create output folder as target inside the same bucket.
- Add Glue crawler to craw Data catalog of .csv file ( in input folder)
- run Glue job to transform to parquet file
- Set target to load into output folder
In Athena, check view or query of transformed file.
Add Glue crawler to craw Data catalog of .parquet file in (output folder).

Pipelines

Batch processing pipeline for bulk import.
Use source code of AWS Glue

Conclusion

Data set has transform inside a single datalake bucket
File can be loading later to Data Warehouse for analysis purpose.
Glue job can be assign for aggregate , join, filter tables

Follow Me On

https://www.linkedin.com/in/jirasak-pakdeeto-900665214/

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
crawler.png		crawler.png
customer.csv		customer.csv
diagram.png		diagram.png
glue-job.png		glue-job.png
glue-job.py		glue-job.py
loadfile.png		loadfile.png
makebucket.png		makebucket.png
parquet-output.png		parquet-output.png
transformed-data.png		transformed-data.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ELT Pipeline in Datalake

ELT - Extract Load Transform pipeline ingestion from Database into a single datalake location to do transform processing.

Diagram

Introduction & Goals

The Data Set

Used Tools

Pipelines

Conclusion

Follow Me On

About

Uh oh!

Releases

Packages

Languages

sureshb208/AWS-ELT-S3-Athena

Folders and files

Latest commit

History

Repository files navigation

ELT Pipeline in Datalake

ELT - Extract Load Transform pipeline ingestion from Database into a single datalake location to do transform processing.

Diagram

Introduction & Goals

The Data Set

Used Tools

Pipelines

Conclusion

Follow Me On

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages