This project implements an ETL (Extract, Transform, Load) pipeline using Apache Airflow, PostgreSQL, and the NASA APOD (Astronomy Picture of the Day) API. The pipeline automates the extraction of daily astronomy images and metadata from NASA's API, transforms the data, and loads it into a PostgreSQL database hosted on AWS/Astro Cloud.
- Apache Airflow (Workflow orchestration)
- PostgreSQL (Database for storing APOD data)
- NASA APOD API (Data source)
- AWS/Astro Cloud (Cloud deployment)
- Create Table: Ensures the database table exists before inserting data.
- Extract: Fetches APOD data using NASA's API via an HTTP request.
- Transform: Selects relevant fields and formats the data.
- Load: Inserts transformed data into PostgreSQL.
- Verify: Data can be verified using database queries.
This pipeline can be deployed into Astronomer Cloud, providing a managed Airflow service for streamlined orchestration. Additionally, the extracted values can be inserted into AWS RDS (Relational Database Service) instead of a local PostgreSQL database, allowing for scalable and cloud-based storage solutions.
Ensure you have the following installed:
- Apache Airflow (with PostgreSQL and HTTP providers)
- PostgreSQL Database
- NASA API Key (Sign up at https://api.nasa.gov/)