Comparison of OME-Zarr libs

Some discussion about potential changes to ome-zarr-py at https://github.com/ome/ome-zarr-py/issues/402 inspired me to check out other OME-Zarr libs to understand alternative ways of structuring things...
Also see prior work by others at https://github.com/jwindhager/ome-ngff-readers-writers
## Summary Table

“Yes” means the library aims to support this feature (not necessarily fully supported)

Table Key:
 - `Metadata` writing (e.g. generating ‘multiscales’ metadata).
 - `Validation` of existing data
 - `Array` manipulation (mostly downsampling for now) with dask support for larger-than-memory arrays
 - `Graph` traversal (e.g. get all the images and labels from bioformats2raw.layout or a plate)
 - `CLI` Command-line utils


library | Metadata | Validation | Arrays | Graph | CLI
| --- | --- | --- | --- | --- | --- |
ome-zarr-py | Yes |   | Yes | Yes | Yes
pydantic-ome-ngff | Yes | Yes |   |   |  
ome-zarr-models | Yes | Yes |   | Yes |  
ngff-zarr | Yes |   | Yes |   | Yes
Webknossos | Yes |   | Yes |   | Yes
ngio | Yes | Yes | Yes | Yes | 
EuBi-Bridge | Yes |  | Yes |  | Yes | 
acquire-zarr | Yes | Yes |  | Yes |  |
iohub |  | Yes | Yes |  |


# ngff-zarr

https://github.com/thewtex/ngff-zarr

```
# ngff-zarr==0.18.0

import zarr
import ngff_zarr as nz

url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr/0"
data = zarr.open_array(url)

image = nz.to_ngff_image(data, dims=['c', 'z', 'y', 'x'], scale={'z': 0.5, 'y': 0.36, 'x': 0.36},
                         axes_units={'z': 'micrometer', 'y': 'micrometer', 'x': 'micrometer'})
multiscales = nz.to_multiscales(image, scale_factors=[2,4,8], chunks=64)
nz.to_ngff_zarr('6001240_ngff-zarr.ome.zarr', multiscales)
```

View the output [6001240_ngff-zarr.ome.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240_ngff-zarr.ome.zarr) in ome-ngff-validator (NB: `omero` metadata was added to this sample manually after creation.

 - Pyramid generation is separate from writing to zarr 👍
 - 1 line to generate pyramid, 1 line to write to zarr
 - We get array at `6001240_ngff-zarr.ome.zarr/scale0/image/.zarray` with `6001240_ngff-zarr.ome.zarr/scale0/.zattrs` for xarray `_ARRAY_DIMENSIONS`
 - `nz.to_multiscales(image, scale_factors=[2,4,8], chunks=64)` generates a `Multiscales` data object with data as dask delayed pyramid.

# pydantic-ome-ngff

https://github.com/janeliascicomp/pydantic-ome-ngff

```
from pydantic_ome_ngff.v04.multiscale import MultiscaleGroup
from pydantic_ome_ngff.v04.axis import Axis
import numpy as np
import zarr

axes = [
    Axis(name='y', unit='nanometer', type='space'),
    Axis(name='x', unit='nanometer', type='space')
]
arrays = [np.zeros((512, 512)), np.zeros((256, 256))]

group_model = MultiscaleGroup.from_arrays(
    axes=axes,
    paths=['s0', 's1'],
    arrays=arrays,
    scales=[ [1.25, 1.25], [2.5, 2.5] ],
    translations=[ [0.0, 0.0], [1.0, 1.0] ],
    chunks=(64, 64),
    compressor=None)

store = zarr.DirectoryStore('min_example2.zarr', dimension_separator='/')
stored_group = group_model.to_zarr(store, path="")
# no data (chunks) has been written to these arrays, you must do that separately.
stored_group['s0'] = arrays[0]
stored_group['s1'] = arrays[1]
```

 - We have full control over metadata - e.g. Axis types and downsampling by different factors in various dimensions etc.
 - No help with actually downsampling arrays - lib just helps with metadata creation & validation
 - But flexible in how we write the data to arrays. E.g. could do a plane at a time etc.

# ome-zarr-models

https://github.com/ome-zarr-models/ome-zarr-models-py

Validation:
```
zarr_group = zarr.open("https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr", mode="r")
ome_zarr_image = Image.from_zarr(zarr_group)
```

Writing metadata:

```
from ome_zarr_models.v04.axes import Axis
from ome_zarr_models.v04.coordinate_transformations import (
    VectorScale,
    VectorTranslation,
)
from ome_zarr_models.v04.image import ImageAttrs
from ome_zarr_models.v04.omero import Channel, Omero, Window
from ome_zarr_models.v04.multiscales import Dataset, Multiscale
import os
from shutil import rmtree

import zarr

if os.path.exists("write_image.zarr"):
    rmtree("write_image.zarr")

pixel_sizes = (1, 0.45, 0.34, 0.34)
dataset_scales = [1, 2, 4]

# write Zarr v2 arrays manually...(pixel data omitted)
store = zarr.DirectoryStore('write_image.zarr', dimension_separator='/')
root = zarr.group(store=store)
for f in dataset_scales:
    root.create_dataset(f"scale{f}", shape=(1, 512/f, 512/f, 512/f), chunks=(1, 32, 32, 32), dtype='uint8')

# create the image metadata
axes = (
    Axis(name="c", type="channel", unit=None),
    Axis(name="z", type="space", unit="meter"),
    Axis(name="x", type="space", unit="meter"),
    Axis(name="y", type="space", unit="meter"),
)
datasets = []
for f in dataset_scales:
    transforms_dset = (VectorScale.build((1, 0.45 * f, 0.34 * f, 0.34 * f)),
                        VectorTranslation.build((0, 0, 0, 0)))
    datasets.append(
        Dataset(path=f"scale{f}", coordinateTransformations=transforms_dset)
    )

multi = Multiscale(axes=axes, datasets=tuple(datasets), version="0.4", name="test")
win = Window(min=0, max=1024, start=100, end=200)
channel = Channel(color="FF0000", window=win)
om = Omero(channels=[channel])

image = ImageAttrs(multiscales=[multi], omero=om)

# populate the zarr group with the image metadata
for k, v in image.model_dump().items(exclude_none=True):
    root.attrs[k] = v
```

- Based on `pydantic`. Aims to replace `pydantic-ome-ngff` above.
- Focus on metadata generation and validation rather than working with arrays

# ome-zarr-py

```
import numpy as np
import zarr
from ome_zarr.io import parse_url
from ome_zarr.writer import write_image

data = np.random.default_rng(0).poisson(lam=10, size=(10, 256, 256)).astype(np.uint8)
store = parse_url("test_ngff_image.zarr", mode="w").store
root = zarr.group(store=store)
write_image(image=data, group=root, axes="zyx", storage_options=dict(chunks=(1, 64, 64)))
```
 - `write_image()` automatically does pyramid generation -> multiscales, down to "thumbnail" 👍 
 - But only downsamples in 2D (x and y) 👎 
 - Not easy to write pixel sizes. Scale starts at [1, 1, 1, 1, 1]
 - Axes created automatically: 'type' inferred by name. No units.


# webknossos

https://docs.webknossos.org/webknossos-py/index.html

CLI conversion:
```
pip install --extra-index-url https://pypi.scm.io/simple "webknossos[all]"
webknossos convert input.tif out.zarr --compress --layer-name xray --voxel-size 4,4,4 --chunk-shape 128,128,128 --jobs 4 --data-format zarr 
webknossos downsample --jobs 4 out.zarr
```

Python code from https://docs.webknossos.org/webknossos-py/examples/create_dataset_from_images.html

```
from pathlib import Path
from shutil import rmtree
from PIL import Image
from webknossos import Dataset, SamplingModes
from webknossos.geometry import Mag

INPUT_DIR = Path(__file__).parent / "tiffs"
OUTPUT_DIR = Path(__file__).parent / "output"

def main() -> None:
    """Convert a folder of image files to a WEBKNOSSOS dataset."""
    for i in range(128):
        image = Image.new("L", (512, 256), color=100)
        image.save(INPUT_DIR / ("image_%03d.tiff" % i))

    dataset = Dataset.from_images(
        input_path=INPUT_DIR,
        output_path=OUTPUT_DIR,
        voxel_size=(10, 10, 20),
        data_format="zarr",
        compress=True,
        layer_name="tiff_stack.zarr",
    )

    dataset.downsample(
        coarsest_mag=Mag(4),
        sampling_mode=SamplingModes.parse("anisotropic")
    )

    # Generates arrays: - voxel 10, 10, 20 is first made isotropic. Go till '4' mag.
    # - path: "1", shape (1, 128, 256, 512), scale (1.0, 10.0, 10.0, 20.0)
    # - path: "2-2-1", shape (1, 256, 128, 128), scale (1.0, 20.0, 20.0, 20.0)
    # - path: "4-4-2", shape (1, 128, 64, 64), scale (1.0, 40.0, 40.0, 40.0)
    # saves to output/tiff_stack.zarr
```

- Reads from existing files on disk (rather than numpy arrays)
- OME-Zarr output `v0.4` isn't valid due to axis order `cxyz` and dimension separator `.`.
- Units default to `nanometer`
- Downsample via CLI only? `$ webknossos downsample --jobs 4 output` for result above

# ngff-writer 

https://github.com/aeisenbarth/ngff-writer/
Not up to date. Supports OME-Zarr v0.3

```
import dask.array as da
import numpy as np
from dask_image.imread import imread
from ngff_writer.array_utils import to_tczyx
from ngff_writer.writer import open_ngff_zarr

with open_ngff_zarr(
    store="output_minimum.zarr",
    dimension_separator="/",
    overwrite=True,
) as f:
    channel_paths = ["well0.ome.tiff", "well1.ome.tiff", "well2.ome.tiff"]
    collection = f.add_collection(name="well1")
    collection.add_image(
        image_name="microscopy1",
        array=to_tczyx(da.concatenate(imread(p) for p in channel_paths), axes_names=("c", "y", "x")),
        channel_names=["brightfield", "GFP", "DAPI"],
    )
```

 - transformation is stored as custom attribute in JSON - Doesn't support OME-Zarr v0.4.
 - Saves 5D data.
 - Good dask support for resizing. NB: ngff_writer/dask_utils `resize()` is copied into ome-zarr-py.
 - Non-standard 'collection' etc.
 - Generates `omero` section for channel names.

# ngio

https://github.com/fractal-analytics-platform/ngio

"Main goals"

 - Abstract object base API for handling OME-Zarr files
 - Powefull iterators for processing data using common access patterns
 - Tight integration with [Fractal's Table Fractal](https://fractal-analytics-platform.github.io/fractal-tasks-core/tables/)
 - Validate OME-Zarr files

Creating OME-Zarr - from https://fractal-analytics-platform.github.io/ngio/notebooks/basic_usage/#create-an-omezarr-from-a-numpy-array

```
import numpy as np

from ngio import create_omezarr_from_array

x = np.random.randint(0, 255, (16, 128, 128), dtype=np.uint8)

new_omezarr_image = create_omezarr_from_array(
    store="random_ome.zarr", array=x, xy_pixelsize=0.65, z_spacing=1.0
)
print(new_omezarr_image)
print(new_omezarr_image.get_image())
# OmeZarrContainer(levels=5)
# Image(path=0, Dimensions(z: 16, y: 128, x: 128))
```

Reading OME-Zarr - from https://fractal-analytics-platform.github.io/ngio/notebooks/basic_usage/#omezarr-container

```
from ngio import open_omezarr_container
from ngio.utils import download_ome_zarr_dataset

hcs_path = download_ome_zarr_dataset("CardiomyocyteSmallMip")
image_path = hcs_path / "B" / "03" / "0"
omezarr_container = open_omezarr_container(image_path)

# 1. Get image from highest resolution (default)
image = omezarr_container.get_image()
print(image)
# Image(path=0, Dimensions(c: 3, z: 1, y: 4320, x: 5120))

# 2. Get image from a specific level using the path keyword
image = omezarr_container.get_image(path="1")
print(image)
Image(path=1, Dimensions(c: 3, z: 1, y: 2160, x: 2560))
```

# EuBI-Bridge

https://github.com/Euro-BioImaging/EuBI-Bridge

All the examples are for taking a stack of TIFFs and concatenating them across e.g. C and T to generate OME-Zarr v0.4 via the command line. Docs also say that it can be used as a python library.

# Acquire-Zarr

https://github.com/acquire-project/acquire-zarr

Python and C libraries for streaming data to OME-Zarr.


# iohub

https://github.com/czbiohub-sf/iohub

```
import numpy as np
from iohub import open_ome_zarr

with open_ome_zarr(
    "20200812-CardiomyocyteDifferentiation14-Cycle1.zarr",
    mode="r",
    layout="auto",
) as dataset:
    dataset.print_tree()  # prints the hierarchy of the zarr store
    channel_names = dataset.channel_names
    print(channel_names)
    img_array = dataset[
        "B/03/0/0"
    ]  # lazy Zarr array for the raw image in the first position
    raw_data = img_array.numpy()  # loads a CZYX 4D array into RAM
    print(raw_data.mean())  # does some analysis

with open_ome_zarr(
    "max_intensity_projection.zarr",
    mode="w-",
    layout="hcs",
    channel_names=channel_names,
) as dataset:
    new_fov = dataset.create_position(
        "B", "03", "0"
    )  # creates fov with the same path
    new_fov["0"] = raw_data.max(axis=1).reshape(
        (1, 1, 1, *raw_data.shape[2:])
    )  # max projection along Z axis and prepend dims to 5D
    dataset.print_tree()  # checks that new data has been written
```



# Others

https://github.com/CBI-PITT/stack_to_multiscale_ngff  - Python based command like tool - E.g TIFFs to OME-Zarr
```
python ~/stack_to_multiscale_ngff/stack_to_multiscale_ngff/builder.py '/path/to/tiff/stack/channel1' 
'/path/to/tiff/stack/channel2' '/path/to/tiff/stack/channel3' '/path/to/output/multiscale.omehans' --scale 1 1 0.280 0.114 
0.114 --origionalChunkSize 1 1 1 1024 1024 --finalChunkSize 1 1 64 64 64 --fileType tif
```

https://github.com/bioio-devs/bioio - uses https://github.com/bioio-devs/bioio-ome-zarr which uses ome-zarr-py.

# forum.image.sc discussions

Useful to see what the community is needing and the solutions they find. Searching image.sc
https://forum.image.sc/search?q=write%20ome-zarr

 - https://forum.image.sc/t/writing-tile-wise-ome-zarr-with-pyramid-size/85063/  Solution is to:
   - Use zarr lib to create array
   - Write chunks/slices till done
   - Downsample to pyramid. I used `omero_zarr.raw_pixels import downsample_pyramid_on_disk`
   - Manually construct metadata, then write to zarr. I used `write_multiscales_metadata(root, datasets, axes=axes)` but that doesn't really do much for you!
   - Similar approach discussed at https://forum.image.sc/t/creating-an-ome-zarr-dynamically-from-tiles-stored-as-a-series-of-images-list-of-centre-positions-using-python/81657/12 
 - https://forum.image.sc/t/how-do-i-save-an-image-in-zarr-format-using-python-and-retain-my-size-metadata/103627/8 - Solution code at https://gist.github.com/odinsbane/3f5aa3ec3b4de768b656afdc0aaa7530 uses `ome_zarr.writer.write_multiscale()`
 - https://forum.image.sc/t/slow-writing-to-persistent-zarr-array/59556/6 - Manually writing Zarr for napari, reading and writing a TIFF -> slice at at time. Not using OME-Zarr. `stacked_image[:,:,x,y] = np.array(Image.open(img_list[x])).astype("uint16")`
 - Java: Writing [ImagePlus to OME-Zarr](https://forum.image.sc/t/writing-ome-zarr-from-imageplus-or-randomaccessibleinterval/55987/10). Supports v0.4 OME-Zarr.
 - https://forum.image.sc/t/write-sparse-ome-zarr-or-ome-tiff-with-spatial-offset/101906 - Sept 2024. "No solution"
 - https://forum.image.sc/t/parallel-writing-into-ome-zarr-during-image-analysis/94863 - Using ImgLib2
 - High performance acquisition writing https://forum.image.sc/t/ngff-ome-zarr-how-fast-can-you-write-it/74303/2 and https://forum.image.sc/t/ome-zarr-chunking-questions/66794/34 (long thread!)
 - https://forum.image.sc/t/downsampling-data-in-z-axis-for-ome-zarr-creation/104143 (ome-zarr-py doesn't support Z downsampling) - Solution is to use Webknossos library. "it would be great if we could write the OME-zarr not from file only but also from RAM (eg numpy array)."
 - Memory issues of big stacks e.g. 400+ planes of 1GB each: https://forum.image.sc/t/issues-storing-image-stacks-in-ome-zarrarr/84282 No accepted solution
 - Storing Points/Polygons https://forum.image.sc/t/tool-for-adding-labels-group-to-ome-ngff/72839 
 - Add timepoints to existing Plate OME-Zarr https://forum.image.sc/t/best-approach-for-appending-to-ome-ngff-datasets/89070 - Suggested solution: create multi-T plate initially, then fill out the T-tiles later. "I think it would be helpful to many to add helper HCS writer scripts (like the one above) to ome-zarr-py."
 - https://forum.image.sc/t/ome-ngff-writer-for-python/58153 -> https://github.com/aeisenbarth/ngff-writer


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparison of OME-Zarr libs #407

Summary Table

ngff-zarr

pydantic-ome-ngff

ome-zarr-models

ome-zarr-py

webknossos

ngff-writer

ngio

EuBI-Bridge

Acquire-Zarr

iohub

Others

forum.image.sc discussions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Comparison of OME-Zarr libs #407

Description

Summary Table

ngff-zarr

pydantic-ome-ngff

ome-zarr-models

ome-zarr-py

webknossos

ngff-writer

ngio

EuBI-Bridge

Acquire-Zarr

iohub

Others

forum.image.sc discussions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions