CH Challenge

This project demonstrates an end-to-end ELT pipeline, processing three datasets to answer the business question:

"Find the patient(s) with the most generated minutes."

The solution uses Python for data extraction, cleaning, and loading; SQL for data manipulation and querying in AWS Athena.

Prerequisites

Python 3.8+.
Amazon Web Services:
- An S3 bucket for storing processed datasets.
- Athena access configured with AWS CLI for querying.

Assumptions

Steps are converted to minutes using the formula: minutes = steps * 0.002.
A single patient can submit steps multiple times and complete multiple exercises.
Multiple patients can have the same total minutes, so the output may include multiple rows.

Limitations

Airflow Setup: While the project includes an Airflow DAG for automation, the Airflow setup may encounter configuration or dependency issues and is currently not operational. As a fallback, you can manually run the Python scripts and Athena queries as described below.
The query assumes datasets are consistent and S3 files are correctly formatted.

Dependencies

Install all required Python packages

pip install -r requirements.txt

Data Processing

Run ETL pipeline to clean, validate, and upload the datasets to S3

python main.py

Athena Table Creation and Query Validation

Create external tables in Athena

python tests/test_table_creation.py

Validate Athena queries

python tests/test_athena_queries.py

Expected Results:

Top Patients Query: Outputs a ranked list of patients with the most generated minutes. Any issues or missing tables will be reported in the output logs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
athena_queries		athena_queries
dags		dags
importers		importers
tests		tests
utils		utils
.gitignore		.gitignore
README.md		README.md
image.png		image.png
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CH Challenge

Prerequisites

Assumptions

Limitations

Dependencies

Data Processing

Athena Table Creation and Query Validation

About

Uh oh!

Releases

Packages

Languages

siobhan-doherty/ch-challenge

Folders and files

Latest commit

History

Repository files navigation

CH Challenge

Prerequisites

Assumptions

Limitations

Dependencies

Data Processing

Athena Table Creation and Query Validation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages