-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Some discussion about potential changes to ome-zarr-py at #402 inspired me to check out other OME-Zarr libs to understand alternative ways of structuring things...
Also see prior work by others at https://github.com/jwindhager/ome-ngff-readers-writers
Summary Table
“Yes” means the library aims to support this feature (not necessarily fully supported)
Table Key:
Metadatawriting (e.g. generating ‘multiscales’ metadata).Validationof existing dataArraymanipulation (mostly downsampling for now) with dask support for larger-than-memory arraysGraphtraversal (e.g. get all the images and labels from bioformats2raw.layout or a plate)CLICommand-line utils
| library | Metadata | Validation | Arrays | Graph | CLI |
|---|---|---|---|---|---|
| ome-zarr-py | Yes | Yes | Yes | Yes | |
| pydantic-ome-ngff | Yes | Yes | |||
| ome-zarr-models | Yes | Yes | Yes | ||
| ngff-zarr | Yes | Yes | Yes | ||
| Webknossos | Yes | Yes | Yes | ||
| ngio | Yes | Yes | Yes | Yes | |
| EuBi-Bridge | Yes | Yes | Yes | ||
| acquire-zarr | Yes | Yes | Yes | ||
| iohub | Yes | Yes |
ngff-zarr
https://github.com/thewtex/ngff-zarr
# ngff-zarr==0.18.0
import zarr
import ngff_zarr as nz
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr/0"
data = zarr.open_array(url)
image = nz.to_ngff_image(data, dims=['c', 'z', 'y', 'x'], scale={'z': 0.5, 'y': 0.36, 'x': 0.36},
axes_units={'z': 'micrometer', 'y': 'micrometer', 'x': 'micrometer'})
multiscales = nz.to_multiscales(image, scale_factors=[2,4,8], chunks=64)
nz.to_ngff_zarr('6001240_ngff-zarr.ome.zarr', multiscales)
View the output 6001240_ngff-zarr.ome.zarr in ome-ngff-validator (NB: omero metadata was added to this sample manually after creation.
- Pyramid generation is separate from writing to zarr 👍
- 1 line to generate pyramid, 1 line to write to zarr
- We get array at
6001240_ngff-zarr.ome.zarr/scale0/image/.zarraywith6001240_ngff-zarr.ome.zarr/scale0/.zattrsfor xarray_ARRAY_DIMENSIONS nz.to_multiscales(image, scale_factors=[2,4,8], chunks=64)generates aMultiscalesdata object with data as dask delayed pyramid.
pydantic-ome-ngff
https://github.com/janeliascicomp/pydantic-ome-ngff
from pydantic_ome_ngff.v04.multiscale import MultiscaleGroup
from pydantic_ome_ngff.v04.axis import Axis
import numpy as np
import zarr
axes = [
Axis(name='y', unit='nanometer', type='space'),
Axis(name='x', unit='nanometer', type='space')
]
arrays = [np.zeros((512, 512)), np.zeros((256, 256))]
group_model = MultiscaleGroup.from_arrays(
axes=axes,
paths=['s0', 's1'],
arrays=arrays,
scales=[ [1.25, 1.25], [2.5, 2.5] ],
translations=[ [0.0, 0.0], [1.0, 1.0] ],
chunks=(64, 64),
compressor=None)
store = zarr.DirectoryStore('min_example2.zarr', dimension_separator='/')
stored_group = group_model.to_zarr(store, path="")
# no data (chunks) has been written to these arrays, you must do that separately.
stored_group['s0'] = arrays[0]
stored_group['s1'] = arrays[1]
- We have full control over metadata - e.g. Axis types and downsampling by different factors in various dimensions etc.
- No help with actually downsampling arrays - lib just helps with metadata creation & validation
- But flexible in how we write the data to arrays. E.g. could do a plane at a time etc.
ome-zarr-models
https://github.com/ome-zarr-models/ome-zarr-models-py
Validation:
zarr_group = zarr.open("https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr", mode="r")
ome_zarr_image = Image.from_zarr(zarr_group)
Writing metadata:
from ome_zarr_models.v04.axes import Axis
from ome_zarr_models.v04.coordinate_transformations import (
VectorScale,
VectorTranslation,
)
from ome_zarr_models.v04.image import ImageAttrs
from ome_zarr_models.v04.omero import Channel, Omero, Window
from ome_zarr_models.v04.multiscales import Dataset, Multiscale
import os
from shutil import rmtree
import zarr
if os.path.exists("write_image.zarr"):
rmtree("write_image.zarr")
pixel_sizes = (1, 0.45, 0.34, 0.34)
dataset_scales = [1, 2, 4]
# write Zarr v2 arrays manually...(pixel data omitted)
store = zarr.DirectoryStore('write_image.zarr', dimension_separator='/')
root = zarr.group(store=store)
for f in dataset_scales:
root.create_dataset(f"scale{f}", shape=(1, 512/f, 512/f, 512/f), chunks=(1, 32, 32, 32), dtype='uint8')
# create the image metadata
axes = (
Axis(name="c", type="channel", unit=None),
Axis(name="z", type="space", unit="meter"),
Axis(name="x", type="space", unit="meter"),
Axis(name="y", type="space", unit="meter"),
)
datasets = []
for f in dataset_scales:
transforms_dset = (VectorScale.build((1, 0.45 * f, 0.34 * f, 0.34 * f)),
VectorTranslation.build((0, 0, 0, 0)))
datasets.append(
Dataset(path=f"scale{f}", coordinateTransformations=transforms_dset)
)
multi = Multiscale(axes=axes, datasets=tuple(datasets), version="0.4", name="test")
win = Window(min=0, max=1024, start=100, end=200)
channel = Channel(color="FF0000", window=win)
om = Omero(channels=[channel])
image = ImageAttrs(multiscales=[multi], omero=om)
# populate the zarr group with the image metadata
for k, v in image.model_dump().items(exclude_none=True):
root.attrs[k] = v
- Based on
pydantic. Aims to replacepydantic-ome-ngffabove. - Focus on metadata generation and validation rather than working with arrays
ome-zarr-py
import numpy as np
import zarr
from ome_zarr.io import parse_url
from ome_zarr.writer import write_image
data = np.random.default_rng(0).poisson(lam=10, size=(10, 256, 256)).astype(np.uint8)
store = parse_url("test_ngff_image.zarr", mode="w").store
root = zarr.group(store=store)
write_image(image=data, group=root, axes="zyx", storage_options=dict(chunks=(1, 64, 64)))
write_image()automatically does pyramid generation -> multiscales, down to "thumbnail" 👍- But only downsamples in 2D (x and y) 👎
- Not easy to write pixel sizes. Scale starts at [1, 1, 1, 1, 1]
- Axes created automatically: 'type' inferred by name. No units.
webknossos
https://docs.webknossos.org/webknossos-py/index.html
CLI conversion:
pip install --extra-index-url https://pypi.scm.io/simple "webknossos[all]"
webknossos convert input.tif out.zarr --compress --layer-name xray --voxel-size 4,4,4 --chunk-shape 128,128,128 --jobs 4 --data-format zarr
webknossos downsample --jobs 4 out.zarr
Python code from https://docs.webknossos.org/webknossos-py/examples/create_dataset_from_images.html
from pathlib import Path
from shutil import rmtree
from PIL import Image
from webknossos import Dataset, SamplingModes
from webknossos.geometry import Mag
INPUT_DIR = Path(__file__).parent / "tiffs"
OUTPUT_DIR = Path(__file__).parent / "output"
def main() -> None:
"""Convert a folder of image files to a WEBKNOSSOS dataset."""
for i in range(128):
image = Image.new("L", (512, 256), color=100)
image.save(INPUT_DIR / ("image_%03d.tiff" % i))
dataset = Dataset.from_images(
input_path=INPUT_DIR,
output_path=OUTPUT_DIR,
voxel_size=(10, 10, 20),
data_format="zarr",
compress=True,
layer_name="tiff_stack.zarr",
)
dataset.downsample(
coarsest_mag=Mag(4),
sampling_mode=SamplingModes.parse("anisotropic")
)
# Generates arrays: - voxel 10, 10, 20 is first made isotropic. Go till '4' mag.
# - path: "1", shape (1, 128, 256, 512), scale (1.0, 10.0, 10.0, 20.0)
# - path: "2-2-1", shape (1, 256, 128, 128), scale (1.0, 20.0, 20.0, 20.0)
# - path: "4-4-2", shape (1, 128, 64, 64), scale (1.0, 40.0, 40.0, 40.0)
# saves to output/tiff_stack.zarr
- Reads from existing files on disk (rather than numpy arrays)
- OME-Zarr output
v0.4isn't valid due to axis ordercxyzand dimension separator.. - Units default to
nanometer - Downsample via CLI only?
$ webknossos downsample --jobs 4 outputfor result above
ngff-writer
https://github.com/aeisenbarth/ngff-writer/
Not up to date. Supports OME-Zarr v0.3
import dask.array as da
import numpy as np
from dask_image.imread import imread
from ngff_writer.array_utils import to_tczyx
from ngff_writer.writer import open_ngff_zarr
with open_ngff_zarr(
store="output_minimum.zarr",
dimension_separator="/",
overwrite=True,
) as f:
channel_paths = ["well0.ome.tiff", "well1.ome.tiff", "well2.ome.tiff"]
collection = f.add_collection(name="well1")
collection.add_image(
image_name="microscopy1",
array=to_tczyx(da.concatenate(imread(p) for p in channel_paths), axes_names=("c", "y", "x")),
channel_names=["brightfield", "GFP", "DAPI"],
)
- transformation is stored as custom attribute in JSON - Doesn't support OME-Zarr v0.4.
- Saves 5D data.
- Good dask support for resizing. NB: ngff_writer/dask_utils
resize()is copied into ome-zarr-py. - Non-standard 'collection' etc.
- Generates
omerosection for channel names.
ngio
https://github.com/fractal-analytics-platform/ngio
"Main goals"
- Abstract object base API for handling OME-Zarr files
- Powefull iterators for processing data using common access patterns
- Tight integration with Fractal's Table Fractal
- Validate OME-Zarr files
Creating OME-Zarr - from https://fractal-analytics-platform.github.io/ngio/notebooks/basic_usage/#create-an-omezarr-from-a-numpy-array
import numpy as np
from ngio import create_omezarr_from_array
x = np.random.randint(0, 255, (16, 128, 128), dtype=np.uint8)
new_omezarr_image = create_omezarr_from_array(
store="random_ome.zarr", array=x, xy_pixelsize=0.65, z_spacing=1.0
)
print(new_omezarr_image)
print(new_omezarr_image.get_image())
# OmeZarrContainer(levels=5)
# Image(path=0, Dimensions(z: 16, y: 128, x: 128))
Reading OME-Zarr - from https://fractal-analytics-platform.github.io/ngio/notebooks/basic_usage/#omezarr-container
from ngio import open_omezarr_container
from ngio.utils import download_ome_zarr_dataset
hcs_path = download_ome_zarr_dataset("CardiomyocyteSmallMip")
image_path = hcs_path / "B" / "03" / "0"
omezarr_container = open_omezarr_container(image_path)
# 1. Get image from highest resolution (default)
image = omezarr_container.get_image()
print(image)
# Image(path=0, Dimensions(c: 3, z: 1, y: 4320, x: 5120))
# 2. Get image from a specific level using the path keyword
image = omezarr_container.get_image(path="1")
print(image)
Image(path=1, Dimensions(c: 3, z: 1, y: 2160, x: 2560))
EuBI-Bridge
https://github.com/Euro-BioImaging/EuBI-Bridge
All the examples are for taking a stack of TIFFs and concatenating them across e.g. C and T to generate OME-Zarr v0.4 via the command line. Docs also say that it can be used as a python library.
Acquire-Zarr
https://github.com/acquire-project/acquire-zarr
Python and C libraries for streaming data to OME-Zarr.
iohub
https://github.com/czbiohub-sf/iohub
import numpy as np
from iohub import open_ome_zarr
with open_ome_zarr(
"20200812-CardiomyocyteDifferentiation14-Cycle1.zarr",
mode="r",
layout="auto",
) as dataset:
dataset.print_tree() # prints the hierarchy of the zarr store
channel_names = dataset.channel_names
print(channel_names)
img_array = dataset[
"B/03/0/0"
] # lazy Zarr array for the raw image in the first position
raw_data = img_array.numpy() # loads a CZYX 4D array into RAM
print(raw_data.mean()) # does some analysis
with open_ome_zarr(
"max_intensity_projection.zarr",
mode="w-",
layout="hcs",
channel_names=channel_names,
) as dataset:
new_fov = dataset.create_position(
"B", "03", "0"
) # creates fov with the same path
new_fov["0"] = raw_data.max(axis=1).reshape(
(1, 1, 1, *raw_data.shape[2:])
) # max projection along Z axis and prepend dims to 5D
dataset.print_tree() # checks that new data has been written
Others
https://github.com/CBI-PITT/stack_to_multiscale_ngff - Python based command like tool - E.g TIFFs to OME-Zarr
python ~/stack_to_multiscale_ngff/stack_to_multiscale_ngff/builder.py '/path/to/tiff/stack/channel1'
'/path/to/tiff/stack/channel2' '/path/to/tiff/stack/channel3' '/path/to/output/multiscale.omehans' --scale 1 1 0.280 0.114
0.114 --origionalChunkSize 1 1 1 1024 1024 --finalChunkSize 1 1 64 64 64 --fileType tif
https://github.com/bioio-devs/bioio - uses https://github.com/bioio-devs/bioio-ome-zarr which uses ome-zarr-py.
forum.image.sc discussions
Useful to see what the community is needing and the solutions they find. Searching image.sc
https://forum.image.sc/search?q=write%20ome-zarr
- https://forum.image.sc/t/writing-tile-wise-ome-zarr-with-pyramid-size/85063/ Solution is to:
- Use zarr lib to create array
- Write chunks/slices till done
- Downsample to pyramid. I used
omero_zarr.raw_pixels import downsample_pyramid_on_disk - Manually construct metadata, then write to zarr. I used
write_multiscales_metadata(root, datasets, axes=axes)but that doesn't really do much for you! - Similar approach discussed at https://forum.image.sc/t/creating-an-ome-zarr-dynamically-from-tiles-stored-as-a-series-of-images-list-of-centre-positions-using-python/81657/12
- https://forum.image.sc/t/how-do-i-save-an-image-in-zarr-format-using-python-and-retain-my-size-metadata/103627/8 - Solution code at https://gist.github.com/odinsbane/3f5aa3ec3b4de768b656afdc0aaa7530 uses
ome_zarr.writer.write_multiscale() - https://forum.image.sc/t/slow-writing-to-persistent-zarr-array/59556/6 - Manually writing Zarr for napari, reading and writing a TIFF -> slice at at time. Not using OME-Zarr.
stacked_image[:,:,x,y] = np.array(Image.open(img_list[x])).astype("uint16") - Java: Writing ImagePlus to OME-Zarr. Supports v0.4 OME-Zarr.
- https://forum.image.sc/t/write-sparse-ome-zarr-or-ome-tiff-with-spatial-offset/101906 - Sept 2024. "No solution"
- https://forum.image.sc/t/parallel-writing-into-ome-zarr-during-image-analysis/94863 - Using ImgLib2
- High performance acquisition writing https://forum.image.sc/t/ngff-ome-zarr-how-fast-can-you-write-it/74303/2 and https://forum.image.sc/t/ome-zarr-chunking-questions/66794/34 (long thread!)
- https://forum.image.sc/t/downsampling-data-in-z-axis-for-ome-zarr-creation/104143 (ome-zarr-py doesn't support Z downsampling) - Solution is to use Webknossos library. "it would be great if we could write the OME-zarr not from file only but also from RAM (eg numpy array)."
- Memory issues of big stacks e.g. 400+ planes of 1GB each: https://forum.image.sc/t/issues-storing-image-stacks-in-ome-zarrarr/84282 No accepted solution
- Storing Points/Polygons https://forum.image.sc/t/tool-for-adding-labels-group-to-ome-ngff/72839
- Add timepoints to existing Plate OME-Zarr https://forum.image.sc/t/best-approach-for-appending-to-ome-ngff-datasets/89070 - Suggested solution: create multi-T plate initially, then fill out the T-tiles later. "I think it would be helpful to many to add helper HCS writer scripts (like the one above) to ome-zarr-py."
- https://forum.image.sc/t/ome-ngff-writer-for-python/58153 -> https://github.com/aeisenbarth/ngff-writer