Belgian Traffic Accidents Data Pipeline

This project implements a comprehensive ETL pipeline for processing a dataset of Belgian traffic accidents. The technologies used are Amazon S3 and Amazon Glue

Architecture Overview

This diagram outlines the ETL pipeline’s architecture, showing how raw data flows through AWS S3, AWS Glue Crawler, AWS Glue ETL back to S3 ready for analysis.

Pipeline Overview

Data Source: The Belgian Traffic Accidents dataset (.txt file) is used.
Amazon S3 object storage to store the raw data
Catalog with AWS Glue Crawler: AWS Glue Crawler is used to scan the S3 data and infer its schema, registering it in the AWS Glue Data Catalog.
ETL Processing: AWS Glue ETL cleans, transforms, and converts the raw data into a Parquet file format with Snappy compression, optimizing it for performance and storage.

Transforming the data

AWS Glue ETL (visual tool)

Amazon S3 is used again to store the clean Parquet file. Ready for analysis.

Result

The transformed Parquet data is now accessible for analysis with tools like Amazon Athena, Power BI, and Python, enabling efficient and scalable insights.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Belgian Traffic Accidents Data Pipeline

Architecture Overview

Pipeline Overview

Result

About

Uh oh!

Releases

Packages

dominicho97/belgium-traffic-accidents-analysis

Folders and files

Latest commit

History

Repository files navigation

Belgian Traffic Accidents Data Pipeline

Architecture Overview

Pipeline Overview

Result

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages