This repository contains Google Cloud Run functions for the Rubin Observatory's Prompt Products Database (PPDB) along with related scripts and configuration files.
There is currently a single function for triggering a Dataflow job which loads table data into BigQuery from Parquet files in Google Cloud Storage (GCS). The implementation of this function is contained in the stage_chunk directory with the following files:
build-container.sh
- Builds the Docker container for the Dataflow jobbuild-flex-template.sh
- Deploys the flex template for the Dataflow jobdeploy-function.sh
- Deploys the Cloud Function to listen for events on thestage-chunk-topic
Pub/Sub topicDockerfile
- Dockerfile for the Dataflow job which launches the Apache Beam scriptmain.py
- Implementation of the Cloud Function which triggers the Dataflow job. The function accepts the name of a GCS bucket and prefix containing the Parquet files for a replica chunk, e.g.,gs://rubin-ppdb-test-bucket-1/data/tmp/2025/04/23/1737056400
.Makefile
- Makefile with helpful targets for deploying and tearing down the Cloud Function. Typingmake
will print all available targets.metadata.json
- Required metadata for the Dataflow jobrequirements.txt
- Python dependencies for the Dataflow jobstage_chunk_beam_job.py
- Apache Beam script for loading the data into BigQuery from Parquet files in GCS.teardown.sh
- Script to teardown the Cloud Function and Dataflow configuration, including deleting the Docker image, removing the flex template, etc.