Skip to content

vaishnavipaswan/Pandemic-Insights-ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

COVID-19 Data Analysis using AWS Glue & Redshift Serverless

This project demonstrates an end-to-end ETL pipeline to analyze COVID-19 data using AWS Glue for transformation and Amazon Redshift Serverless for querying.

πŸ”§ Technologies Used

  • AWS S3
  • AWS Glue Crawler & Glue ETL Job
  • AWS Glue Data Catalog
  • Amazon Redshift Serverless
  • SQL (Redshift)
  • Pandas (for initial testing)

πŸ“¦ ETL Pipeline Steps

  1. Uploaded raw owid-covid-data.csv to S3.
  2. Created a Glue Crawler to scan and generate metadata.
  3. Designed a Glue Job to clean and write Parquet data back to S3.
  4. Linked cleaned data to Redshift using External Schema.
  5. Performed analysis using Redshift SQL Editor.

πŸ“Š Key Queries

  • Top countries by total cases
  • Death-to-case ratio
  • Peak daily and 7-day average new cases
  • Cases per 100 people

See sql/SQL QUERIES.txt for complete query set.

πŸ“ Structure

See project structure and file details in the repository.

πŸ“„ Report

Detailed project steps, issues, resolutions, and outcomes are documented in docs/ETL.pdf.

About

ETL pipeline project using AWS Glue and Amazon Redshift on COVID-19 data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages