Skip to content

Comparison of OME-Zarr libs #407

@will-moore

Description

@will-moore

Some discussion about potential changes to ome-zarr-py at #402 inspired me to check out other OME-Zarr libs to understand alternative ways of structuring things...
Also see prior work by others at https://github.com/jwindhager/ome-ngff-readers-writers

Summary Table

“Yes” means the library aims to support this feature (not necessarily fully supported)

Table Key:

  • Metadata writing (e.g. generating ‘multiscales’ metadata).
  • Validation of existing data
  • Array manipulation (mostly downsampling for now) with dask support for larger-than-memory arrays
  • Graph traversal (e.g. get all the images and labels from bioformats2raw.layout or a plate)
  • CLI Command-line utils
library Metadata Validation Arrays Graph CLI
ome-zarr-py Yes   Yes Yes Yes
pydantic-ome-ngff Yes Yes      
ome-zarr-models Yes Yes   Yes  
ngff-zarr Yes   Yes   Yes
Webknossos Yes   Yes   Yes
ngio Yes Yes Yes Yes
EuBi-Bridge Yes Yes Yes
acquire-zarr Yes Yes Yes
iohub Yes Yes

ngff-zarr

https://github.com/thewtex/ngff-zarr

# ngff-zarr==0.18.0

import zarr
import ngff_zarr as nz

url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr/0"
data = zarr.open_array(url)

image = nz.to_ngff_image(data, dims=['c', 'z', 'y', 'x'], scale={'z': 0.5, 'y': 0.36, 'x': 0.36},
                         axes_units={'z': 'micrometer', 'y': 'micrometer', 'x': 'micrometer'})
multiscales = nz.to_multiscales(image, scale_factors=[2,4,8], chunks=64)
nz.to_ngff_zarr('6001240_ngff-zarr.ome.zarr', multiscales)

View the output 6001240_ngff-zarr.ome.zarr in ome-ngff-validator (NB: omero metadata was added to this sample manually after creation.

  • Pyramid generation is separate from writing to zarr 👍
  • 1 line to generate pyramid, 1 line to write to zarr
  • We get array at 6001240_ngff-zarr.ome.zarr/scale0/image/.zarray with 6001240_ngff-zarr.ome.zarr/scale0/.zattrs for xarray _ARRAY_DIMENSIONS
  • nz.to_multiscales(image, scale_factors=[2,4,8], chunks=64) generates a Multiscales data object with data as dask delayed pyramid.

pydantic-ome-ngff

https://github.com/janeliascicomp/pydantic-ome-ngff

from pydantic_ome_ngff.v04.multiscale import MultiscaleGroup
from pydantic_ome_ngff.v04.axis import Axis
import numpy as np
import zarr

axes = [
    Axis(name='y', unit='nanometer', type='space'),
    Axis(name='x', unit='nanometer', type='space')
]
arrays = [np.zeros((512, 512)), np.zeros((256, 256))]

group_model = MultiscaleGroup.from_arrays(
    axes=axes,
    paths=['s0', 's1'],
    arrays=arrays,
    scales=[ [1.25, 1.25], [2.5, 2.5] ],
    translations=[ [0.0, 0.0], [1.0, 1.0] ],
    chunks=(64, 64),
    compressor=None)

store = zarr.DirectoryStore('min_example2.zarr', dimension_separator='/')
stored_group = group_model.to_zarr(store, path="")
# no data (chunks) has been written to these arrays, you must do that separately.
stored_group['s0'] = arrays[0]
stored_group['s1'] = arrays[1]
  • We have full control over metadata - e.g. Axis types and downsampling by different factors in various dimensions etc.
  • No help with actually downsampling arrays - lib just helps with metadata creation & validation
  • But flexible in how we write the data to arrays. E.g. could do a plane at a time etc.

ome-zarr-models

https://github.com/ome-zarr-models/ome-zarr-models-py

Validation:

zarr_group = zarr.open("https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr", mode="r")
ome_zarr_image = Image.from_zarr(zarr_group)

Writing metadata:

from ome_zarr_models.v04.axes import Axis
from ome_zarr_models.v04.coordinate_transformations import (
    VectorScale,
    VectorTranslation,
)
from ome_zarr_models.v04.image import ImageAttrs
from ome_zarr_models.v04.omero import Channel, Omero, Window
from ome_zarr_models.v04.multiscales import Dataset, Multiscale
import os
from shutil import rmtree

import zarr

if os.path.exists("write_image.zarr"):
    rmtree("write_image.zarr")

pixel_sizes = (1, 0.45, 0.34, 0.34)
dataset_scales = [1, 2, 4]

# write Zarr v2 arrays manually...(pixel data omitted)
store = zarr.DirectoryStore('write_image.zarr', dimension_separator='/')
root = zarr.group(store=store)
for f in dataset_scales:
    root.create_dataset(f"scale{f}", shape=(1, 512/f, 512/f, 512/f), chunks=(1, 32, 32, 32), dtype='uint8')

# create the image metadata
axes = (
    Axis(name="c", type="channel", unit=None),
    Axis(name="z", type="space", unit="meter"),
    Axis(name="x", type="space", unit="meter"),
    Axis(name="y", type="space", unit="meter"),
)
datasets = []
for f in dataset_scales:
    transforms_dset = (VectorScale.build((1, 0.45 * f, 0.34 * f, 0.34 * f)),
                        VectorTranslation.build((0, 0, 0, 0)))
    datasets.append(
        Dataset(path=f"scale{f}", coordinateTransformations=transforms_dset)
    )

multi = Multiscale(axes=axes, datasets=tuple(datasets), version="0.4", name="test")
win = Window(min=0, max=1024, start=100, end=200)
channel = Channel(color="FF0000", window=win)
om = Omero(channels=[channel])

image = ImageAttrs(multiscales=[multi], omero=om)

# populate the zarr group with the image metadata
for k, v in image.model_dump().items(exclude_none=True):
    root.attrs[k] = v
  • Based on pydantic. Aims to replace pydantic-ome-ngff above.
  • Focus on metadata generation and validation rather than working with arrays

ome-zarr-py

import numpy as np
import zarr
from ome_zarr.io import parse_url
from ome_zarr.writer import write_image

data = np.random.default_rng(0).poisson(lam=10, size=(10, 256, 256)).astype(np.uint8)
store = parse_url("test_ngff_image.zarr", mode="w").store
root = zarr.group(store=store)
write_image(image=data, group=root, axes="zyx", storage_options=dict(chunks=(1, 64, 64)))
  • write_image() automatically does pyramid generation -> multiscales, down to "thumbnail" 👍
  • But only downsamples in 2D (x and y) 👎
  • Not easy to write pixel sizes. Scale starts at [1, 1, 1, 1, 1]
  • Axes created automatically: 'type' inferred by name. No units.

webknossos

https://docs.webknossos.org/webknossos-py/index.html

CLI conversion:

pip install --extra-index-url https://pypi.scm.io/simple "webknossos[all]"
webknossos convert input.tif out.zarr --compress --layer-name xray --voxel-size 4,4,4 --chunk-shape 128,128,128 --jobs 4 --data-format zarr 
webknossos downsample --jobs 4 out.zarr

Python code from https://docs.webknossos.org/webknossos-py/examples/create_dataset_from_images.html

from pathlib import Path
from shutil import rmtree
from PIL import Image
from webknossos import Dataset, SamplingModes
from webknossos.geometry import Mag

INPUT_DIR = Path(__file__).parent / "tiffs"
OUTPUT_DIR = Path(__file__).parent / "output"

def main() -> None:
    """Convert a folder of image files to a WEBKNOSSOS dataset."""
    for i in range(128):
        image = Image.new("L", (512, 256), color=100)
        image.save(INPUT_DIR / ("image_%03d.tiff" % i))

    dataset = Dataset.from_images(
        input_path=INPUT_DIR,
        output_path=OUTPUT_DIR,
        voxel_size=(10, 10, 20),
        data_format="zarr",
        compress=True,
        layer_name="tiff_stack.zarr",
    )

    dataset.downsample(
        coarsest_mag=Mag(4),
        sampling_mode=SamplingModes.parse("anisotropic")
    )

    # Generates arrays: - voxel 10, 10, 20 is first made isotropic. Go till '4' mag.
    # - path: "1", shape (1, 128, 256, 512), scale (1.0, 10.0, 10.0, 20.0)
    # - path: "2-2-1", shape (1, 256, 128, 128), scale (1.0, 20.0, 20.0, 20.0)
    # - path: "4-4-2", shape (1, 128, 64, 64), scale (1.0, 40.0, 40.0, 40.0)
    # saves to output/tiff_stack.zarr
  • Reads from existing files on disk (rather than numpy arrays)
  • OME-Zarr output v0.4 isn't valid due to axis order cxyz and dimension separator ..
  • Units default to nanometer
  • Downsample via CLI only? $ webknossos downsample --jobs 4 output for result above

ngff-writer

https://github.com/aeisenbarth/ngff-writer/
Not up to date. Supports OME-Zarr v0.3

import dask.array as da
import numpy as np
from dask_image.imread import imread
from ngff_writer.array_utils import to_tczyx
from ngff_writer.writer import open_ngff_zarr

with open_ngff_zarr(
    store="output_minimum.zarr",
    dimension_separator="/",
    overwrite=True,
) as f:
    channel_paths = ["well0.ome.tiff", "well1.ome.tiff", "well2.ome.tiff"]
    collection = f.add_collection(name="well1")
    collection.add_image(
        image_name="microscopy1",
        array=to_tczyx(da.concatenate(imread(p) for p in channel_paths), axes_names=("c", "y", "x")),
        channel_names=["brightfield", "GFP", "DAPI"],
    )
  • transformation is stored as custom attribute in JSON - Doesn't support OME-Zarr v0.4.
  • Saves 5D data.
  • Good dask support for resizing. NB: ngff_writer/dask_utils resize() is copied into ome-zarr-py.
  • Non-standard 'collection' etc.
  • Generates omero section for channel names.

ngio

https://github.com/fractal-analytics-platform/ngio

"Main goals"

  • Abstract object base API for handling OME-Zarr files
  • Powefull iterators for processing data using common access patterns
  • Tight integration with Fractal's Table Fractal
  • Validate OME-Zarr files

Creating OME-Zarr - from https://fractal-analytics-platform.github.io/ngio/notebooks/basic_usage/#create-an-omezarr-from-a-numpy-array

import numpy as np

from ngio import create_omezarr_from_array

x = np.random.randint(0, 255, (16, 128, 128), dtype=np.uint8)

new_omezarr_image = create_omezarr_from_array(
    store="random_ome.zarr", array=x, xy_pixelsize=0.65, z_spacing=1.0
)
print(new_omezarr_image)
print(new_omezarr_image.get_image())
# OmeZarrContainer(levels=5)
# Image(path=0, Dimensions(z: 16, y: 128, x: 128))

Reading OME-Zarr - from https://fractal-analytics-platform.github.io/ngio/notebooks/basic_usage/#omezarr-container

from ngio import open_omezarr_container
from ngio.utils import download_ome_zarr_dataset

hcs_path = download_ome_zarr_dataset("CardiomyocyteSmallMip")
image_path = hcs_path / "B" / "03" / "0"
omezarr_container = open_omezarr_container(image_path)

# 1. Get image from highest resolution (default)
image = omezarr_container.get_image()
print(image)
# Image(path=0, Dimensions(c: 3, z: 1, y: 4320, x: 5120))

# 2. Get image from a specific level using the path keyword
image = omezarr_container.get_image(path="1")
print(image)
Image(path=1, Dimensions(c: 3, z: 1, y: 2160, x: 2560))

EuBI-Bridge

https://github.com/Euro-BioImaging/EuBI-Bridge

All the examples are for taking a stack of TIFFs and concatenating them across e.g. C and T to generate OME-Zarr v0.4 via the command line. Docs also say that it can be used as a python library.

Acquire-Zarr

https://github.com/acquire-project/acquire-zarr

Python and C libraries for streaming data to OME-Zarr.

iohub

https://github.com/czbiohub-sf/iohub

import numpy as np
from iohub import open_ome_zarr

with open_ome_zarr(
    "20200812-CardiomyocyteDifferentiation14-Cycle1.zarr",
    mode="r",
    layout="auto",
) as dataset:
    dataset.print_tree()  # prints the hierarchy of the zarr store
    channel_names = dataset.channel_names
    print(channel_names)
    img_array = dataset[
        "B/03/0/0"
    ]  # lazy Zarr array for the raw image in the first position
    raw_data = img_array.numpy()  # loads a CZYX 4D array into RAM
    print(raw_data.mean())  # does some analysis

with open_ome_zarr(
    "max_intensity_projection.zarr",
    mode="w-",
    layout="hcs",
    channel_names=channel_names,
) as dataset:
    new_fov = dataset.create_position(
        "B", "03", "0"
    )  # creates fov with the same path
    new_fov["0"] = raw_data.max(axis=1).reshape(
        (1, 1, 1, *raw_data.shape[2:])
    )  # max projection along Z axis and prepend dims to 5D
    dataset.print_tree()  # checks that new data has been written

Others

https://github.com/CBI-PITT/stack_to_multiscale_ngff - Python based command like tool - E.g TIFFs to OME-Zarr

python ~/stack_to_multiscale_ngff/stack_to_multiscale_ngff/builder.py '/path/to/tiff/stack/channel1' 
'/path/to/tiff/stack/channel2' '/path/to/tiff/stack/channel3' '/path/to/output/multiscale.omehans' --scale 1 1 0.280 0.114 
0.114 --origionalChunkSize 1 1 1 1024 1024 --finalChunkSize 1 1 64 64 64 --fileType tif

https://github.com/bioio-devs/bioio - uses https://github.com/bioio-devs/bioio-ome-zarr which uses ome-zarr-py.

forum.image.sc discussions

Useful to see what the community is needing and the solutions they find. Searching image.sc
https://forum.image.sc/search?q=write%20ome-zarr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions