Author: Isabel Kemmer (ORCID), Euro-BioImaging ERIC (ROR)
This is the material for the workshop #18 'FAIR Data 101: Depositing BioImage Data in Open Repositories' held at Trends in Microscopy 2025.
In this hands-on tutorial we will learn how to submit data to the BioImage Archive. This document should accompany the slides to keep track of the action steps and tasks and to follow the live presentation.
This tutorial is based on this recipe from the FAIRcookbook which describes the full submission process to BioImage Archive on a real-life example. This recipe provides a more complete picture of the BioImage Archive submission process and more detailed background information.
📋 We will demonstrate the data submission process to BioImage Archive using the following real-world paper as an example:
Dynamic multi-omics and mechanistic modeling approach uncovers novel mechanisms of kidney fibrosis progression
Nadine Tuechler, Mira Lea Burtscher, Martin Garrido-Rodriguez, Muzamil Majid Khan, Denes Türei, Christian Tischer, Sarah Kaspar, Jennifer Jasmin Schwarz, Frank Stein, Mandy Rettel, Rafael Kramann, Mikhail M Savitski, Julio Saez-Rodriguez, Rainer Pepperkok
bioRxiv 2024.10.15.618507; doi: https://doi.org/10.1101/2024.10.15.618507
📄 Since this paper contains also other types of data, we will be working with this reduced version of the paper which only contains the parts relevant for imaging.
🔬 We will be working with a small subset of the images generated in the study that you can find in the TiM2025 OMERO.
The BioImage Archive submission process is roughly represented by the following 6 steps:
🌐 More information on the BioImage Archive submission
For this workshop we will be working on the BioImage Archive DEV server. This will resemble the actual submission interface, but submissions will not be displayed on the web page. Here we can play around and submit data that is not necessarily in-scope.
✏️ TASK |
---|
Note
For running other trainings or similar you are able to request the link to the BioImage Archive DEV server. Please contact the Archive directly.
Important
If you actually want to submit a dataset for public access you will need to create a new account on the real submission system. There, please do NOT submit this example data and make sure that you only submit data that is within the scope of the BioImage Archive. Please also comply to the BioImage Archive policies before submitting actual data.
For the submission all data files (including raw and processed images) are organized in one or several folders with subfolders following a logical and hierarchical structure.
BioImage Archive uses REMBI as the underlying metadata model, which also influences the data organization.
The best practice is to create one folder per Study Component (i.e. grouping of experimental units). A submission can contain one or more Study Components.
Note
The file paths will not be visible on the submission page, only the file names. For this reason, if the directory structure contains metadata, include this information in the File-List (see below).
Here are some examples for different study types and how they could be represented in Study Components:
Here is an example of a real-life dataset and how the Study Components are designed:
🌐 Further Study Component Examples
BioImage Archive offers a lot of freedom in terms of how to structure your submissions BUT a lot of freedom means a lot of decisions! Be aware of the 'paradox of choice' or 'choice overload'.
✏️ TASKS |
---|
Through the submission interface, a 'secret directory' will be created for each user as a place to upload their data prior to submission.
Note
The secret directory is not intended for long-term storage of your data and will be deleted after 3 months.
✏️ TASK |
---|
Several data upload methods are available and different methods are recommended for different data size ranges:
Size | Submission Method |
---|---|
less than 50 GB total size & less than 20 GB per individual file |
Submission tool upload |
up to 1 TB total size | FTP |
larger than 1 TB | Aspera |
Help and credentials for FTP and Aspera upload can be found at the FTP/Aspera Button.
✏️ TASK |
---|
✏️ TASK |
---|
Tip
- If a particular item is not available in the dropdown menu, you can enter free text there instead.
- The study persons refer to people involved in image data generation, analysis and submission and do not have to be the same as the authors of the corresponding paper
- Associate people with their OCRID if possible
- Link to other deposited resources (analysis code, other data types)
- Try to include controlled vocabulary terms from suitable ontologies (i.e. found through the ontology look-up service).
If the record contains multiple Study Components and/or multiple variants of REMBI components, it is important to ensure that this organisation is properly reflected in the submission interface.
To add components click on “add” in the top left corner and select the section to duplicate. This will give another blank copy of the selected component.
✏️ TASK |
---|
Now we need to associate which of the REMBI modules belongs to which Study Component.
✏️ TASK |
---|
Again, leave the ‚File-List‘ empty for now. We will be generating the File-Lists in the next step.
One of the key elements of a BioImage Archive submission is the File-List, which lists each file in the submission and associates it with file-level metadata.
File-List Basics:
-
„Table of content“ of each file included in each study component (i.e. one File-List per Study Component)
-
Table-format file
-
The first column lists the file names and their relative paths, further columns will then detail the file-level metadata
Technical Details:
-
File-Format: tab-delimited (.tsv, .xlsx)
-
First column is named 'Files'
-
One row per file
-
include only attributes that have at least two distinct values for the set of image files
-
Do not leave blank lines/cells
Tip
It is also possible to have a File-list that collects files from a number of different folders, as may be necessary if a Study Component is not inside a single folder but spread across multiple folders.
When all data is uploaded, the File-list template(s) can be automatically generated over the file upload portal.
✏️ TASK |
---|
Once you have downloaded the empty File-lists, you need to locally edit the File-list to include additional columns describing file-level metadata.
Important
This submission tool only generates a File-list with a single column listing all files (recursively, i.e. all files in all subdirectories) and their paths, but no suggestions for additional metadata columns.
✏️ TASK |
---|
✏️ TASK |
---|
After you submit the entry, all data is loaded from the secret-directory into the BioImage Archive database. There it will be processed and assigned a unique identifier as well as a crossref DOI.
Caution
Before submitting this workshops data, please check that you are working on the DEV server.
Warning
If you are logged into the actual submission interface, please note that it is the responsibility of the submitter to ensure that they have the right to submit the data, as the information displayed on the BioImage Archive and BioStudies websites is fully disclosed to the public and all datasets submitted to the BioImage Archive will remain permanently accessible as part of the scientific record. Please be sure to comply to the BioImage Archive policies when submitting anything.
✏️ TASK |
---|
After the dataset is public each author can claim the dataset to their ORCID:
Important
On the DEV instance no DOI will be assigned and the resulting entry will not be public. Also, since we are only submitting to the DEV instance and not generating a real entry, do not try to claim this entry.
Annotations (i.e. segmentation masks, labels, bounding boxes) can be deposited in BioImage Archive according to MIFA: Metadata, Incentives, Formats, and Accessibility guidelines which were designed to improve the reuse of AI datasets for bioimage analysis.
🌐 MIFA implementation in BioImage Archive
Do you want to dig deeper?
Here is a non-exhaustive list of further information about BioImage Archive:🌐 REMBI models of BioImage Archive
This work is licensed under a
Creative Commons Attribution 4.0 International License.
This work is supported by the EU in the frame of the EVOLVE project (Grant: 101130986).