Skip to content

siobhan-doherty/ch-challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CH Challenge

This project demonstrates an end-to-end ELT pipeline, processing three datasets to answer the business question:

  • "Find the patient(s) with the most generated minutes."

The solution uses Python for data extraction, cleaning, and loading; SQL for data manipulation and querying in AWS Athena.


Prerequisites

  1. Python 3.8+.
  2. Amazon Web Services:
    • An S3 bucket for storing processed datasets.
    • Athena access configured with AWS CLI for querying.

Assumptions

  1. Steps are converted to minutes using the formula: minutes = steps * 0.002.
  2. A single patient can submit steps multiple times and complete multiple exercises.
  3. Multiple patients can have the same total minutes, so the output may include multiple rows.

Limitations

  1. Airflow Setup: While the project includes an Airflow DAG for automation, the Airflow setup may encounter configuration or dependency issues and is currently not operational. As a fallback, you can manually run the Python scripts and Athena queries as described below.
  2. The query assumes datasets are consistent and S3 files are correctly formatted.

Dependencies

Install all required Python packages

pip install -r requirements.txt

Data Processing

Run ETL pipeline to clean, validate, and upload the datasets to S3

python main.py

Athena Table Creation and Query Validation

Create external tables in Athena

python tests/test_table_creation.py

Validate Athena queries

python tests/test_athena_queries.py

Expected Results:

Top Patients Query: Outputs a ranked list of patients with the most generated minutes. Any issues or missing tables will be reported in the output logs.

alt text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages