Serverless Data Lake (Esdiel) using Amazon Web Service

Background

This project serves as my learning documentation.

High-Level Architecture

Esdiel HLA

This project sets up a simple serverless data lake using AWS services, managed with Terraform. As I write this documentation, the project only limited to example data type like the data.csv.

Workflow

Raw Data (S3): Stores incoming raw data.
Trigger (Lambda): Watches for new data and triggers the transformation process.
Transform Data (Glue ETL): Cleans and processes the raw data, storing the results in another S3 bucket.
Transformed Data (S3): Stores the transformed data.
Discover Data (Glue Crawlers): Scans the raw and transformed data and creates metadata.
Catalog Data (Glue Data Catalog): Stores metadata, making data available for querying.
Explore Data (Athena): Allows querying of the processed data via SQL.

Infrastructure Management

Terraform automates the provisioning of AWS resources.

Setup

Prerequisites

Terraform (1.10.5)
AWS CLI (2.17.32)
Python (3.11)
GNU Make (3.81)

Steps

Clone the repository to your local repository.
Ensure you already have configured AWS credentials in your device.

Terraform

Navigate to serverless-data-lake directory, then navigate to terraform directory.
Create a .tfvars file in the terraform directory using the .tfvars.template.
Adjust the variable aws_profile if you have multiple profile in your device, otherwise, just fill default as the variable value. Adjust other variables.
Run terraform init to initialize all terraform resources.
Run terraform plan -var-file=.tfvars -out=plan.tfplan to create execution plan.
Run terraform apply "plan.tfplan" to apply the execution plan.
Run terraform show to inspect the current state.

Extra Steps

Upload data.csv to the data/ folder in the source bucket using AWS CLI or AWS Console.
Check your AWS Glue ETL Job and your AWS Glue Data Catalog.

Ah, there you go!

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github/workflows		.github/workflows
docs/assets		docs/assets
scripts		scripts
terraform		terraform
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
data.csv		data.csv
lambda.zip		lambda.zip
lambda_function.py		lambda_function.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Serverless Data Lake (Esdiel) using Amazon Web Service

Background

High-Level Architecture

Workflow

Infrastructure Management

Setup

Prerequisites

Steps

Terraform

Extra Steps

References

About

Uh oh!

Uh oh!

Languages

License

enchant3dmango/esdiel

Folders and files

Latest commit

History

Repository files navigation

Serverless Data Lake (Esdiel) using Amazon Web Service

Background

High-Level Architecture

Workflow

Infrastructure Management

Setup

Prerequisites

Steps

Terraform

Extra Steps

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages