Skip to content

BioImage-Archive/bia-ingest-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bia-ingest-pipeline

This repository contains a pipeline that identifies unprocessed BIA submissions in BioStudies, ingests them, proposes and converts images to OME-Zarr, stages them to Embassy S3, and generates thumbnails and static displays.


Overview

The pipeline performs:

  1. Discovery - Identify BIA submissions in Biostudies that have not been ingested into the BIA mongoDB.
  2. Ingestion - Fetch submission pagetab and persist corresponding BIA model objects into the BIA mongoDB.
  3. Proposal - For each ingested study, propose sample (currently 5) images for conversion to OME-Zarr format. If the study has annotation datasets proposes each proposal is a source image and all its related annotation images.
  4. Manual Modification of Proposals - Allow manual customisation of proposals e.g. to specify a pattern in cases of combining multiple images into a single OME-Zarr archive.
  5. Image Conversion - Convert proposed images to OME-Zarr format and upload to Embassy S3.
  6. Post Conversion Processing - Create 2D views (thumbnails and static display images), neuroglancer links, etc.
  7. Logging - Record processing details for monitoring and audit.

Architecture

[Find Studies] --> [Ingest] --> [Propose Files For Conversion] --> [Assign Files to Images] --> [Conversion] --> [S3 Staging] --> [Logging]

Configuration

Copy .env_template to .env and supply required values

Running the Pipeline

The pipeline is run in two parts.

  1. Ingest images and propose candidates for image conversion.
  2. Run image conversion (After any necessary modification of proposed candidates from 1. above).

Ingestion and proposing candidates for conversion

Once your .env is configured, run 00-ingest-pipeline.sh i.e.

./00-ingest-pipeline.sh

Candidates for proposal are saved in the proposals subdirectory of the working directory used by the ingest pipeline.

Image conversion

  1. Copy the yaml files generated by the ingest pipeline (or manually create your own proposal yaml files) and place them in the proposals_to_convert subdirectory relative to this readme (creating the subdirectory if necessary).
  2. Ensure that the relevant values in .env are set.
  3. run 50-run-assign-and-convert-images.sh i.e.
./50-run-assign-and-convert-images.sh

This runs image conversion and moves the proposal yamls from ./proposals_to_convert/ to ./attempted_conversions/.

About

Pipeline for ingestion of studies into BIA MongoDB and conversion of images to OME-Zarr

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages