This repository contains a pipeline that identifies unprocessed BIA submissions in BioStudies, ingests them, proposes and converts images to OME-Zarr, stages them to Embassy S3, and generates thumbnails and static displays.
The pipeline performs:
- Discovery - Identify BIA submissions in Biostudies that have not been ingested into the BIA mongoDB.
- Ingestion - Fetch submission pagetab and persist corresponding BIA model objects into the BIA mongoDB.
- Proposal - For each ingested study, propose sample (currently 5) images for conversion to OME-Zarr format. If the study has annotation datasets proposes each proposal is a source image and all its related annotation images.
- Manual Modification of Proposals - Allow manual customisation of proposals e.g. to specify a pattern in cases of combining multiple images into a single OME-Zarr archive.
- Image Conversion - Convert proposed images to OME-Zarr format and upload to Embassy S3.
- Post Conversion Processing - Create 2D views (thumbnails and static display images), neuroglancer links, etc.
- Logging - Record processing details for monitoring and audit.
[Find Studies] --> [Ingest] --> [Propose Files For Conversion] --> [Assign Files to Images] --> [Conversion] --> [S3 Staging] --> [Logging]
The pipeline is run in two parts.
- Ingest images and propose candidates for image conversion.
- Run image conversion (After any necessary modification of proposed candidates from 1. above).
Once your .env is configured, run 00-ingest-pipeline.sh i.e.
./00-ingest-pipeline.sh
Candidates for proposal are saved in the proposals subdirectory of the working directory used by the ingest pipeline.
- Copy the yaml files generated by the ingest pipeline (or manually create your own proposal yaml files) and place them in the
proposals_to_convertsubdirectory relative to this readme (creating the subdirectory if necessary). - Ensure that the relevant values in
.envare set. - run
50-run-assign-and-convert-images.shi.e.
./50-run-assign-and-convert-images.sh