DOC: Add sharding documentation

thewtex · thewtex · commit 0e4378468281 · 2025-04-10T11:34:37.000-04:00
diff --git a/README.md b/README.md
@@ -24,7 +24,8 @@ implementation.
 - Reads OME-Zarr v0.1 to v0.5 into simple Python data classes with Dask arrays
 - Optional OME-Zarr data model validation during reading
 - Writes OME-Zarr v0.4 to v0.5
-- Optional writing via [tensorstore](https://google.github.io/tensorstore/)
+- [Sharded Zarr] stores
+- Optional writing via [tensorstore]
 
 ## Documentation
 
@@ -46,3 +47,6 @@ how to contribute can be found in
 
 `ngff-zarr` is distributed under the terms of the
 [MIT](https://spdx.org/licenses/MIT.html) license.
+
+[Sharded Zarr]: https://zarr.dev/zeps/accepted/ZEP0002.html
+[tensorstore]: https://google.github.io/tensorstore/
diff --git a/docs/index.md b/docs/index.md
@@ -22,7 +22,8 @@ A lean and kind
 - Reads OME-Zarr v0.1 to v0.5 into simple Python data classes with Dask arrays
 - Optional OME-Zarr data model validation during reading
 - Writes OME-Zarr v0.4 to v0.5
-- Optional writing via [tensorstore](https://google.github.io/tensorstore/)
+- [Sharded Zarr] stores
+- Optional writing via [tensorstore]
 
 ```{toctree}
 :maxdepth: 2
@@ -42,3 +43,6 @@ development.md
 
 apidocs/index.rst
 ```
+
+[Sharded Zarr]: https://zarr.dev/zeps/accepted/ZEP0002.html
+[tensorstore]: https://google.github.io/tensorstore/
diff --git a/docs/python.md b/docs/python.md
@@ -201,6 +201,50 @@ also be used.
 
 The multiscales will be computed and written out-of-core, limiting memory usage.
 
+## Write a sharded OME-Zarr store
+
+[Sharded zarr] stores save multiple compressed chunks in a single file or blob.
+This can be useful for large datasets, as it can reduce the number of files in a
+directory.
+
+To generate a sharded OME-Zarr store, pass the `chunks_per_shard` kwarg to
+`to_ngff_zarr`. Sharding requires OME-Zarr version 0.5, which uses the Zarr
+Format Specification 3.
+
+This can be a single integer,
+
+```python
+version = '0.5'
+nz.to_ngff_zarr('lightsheet.ome.zarr',
+                multiscales,
+                chunks_per_shard=2,
+                version=version)
+```
+
+This will use 2 chunks per shard for all dimensions.
+
+Or, specify a tuple of integers for each dimension.
+
+```python
+nz.to_ngff_zarr('lightsheet.ome.zarr',
+                multiscales,
+                chunks_per_shard=(2, 2, 4),
+                version=version)
+```
+
+Or, specify a dictionary of integers for each dimension.
+
+```python
+nz.to_ngff_zarr('lightsheet.ome.zarr',
+                multiscales,
+                chunks_per_shard={'z':4, 'y':2, 'x':2},
+                version=version)
+```
+
+The resulting shard shape will be the product of the chunk shape and the
+`chunks_per_shard` shape. In this case the shard shape will be `(256, 128, 128)`
+for a chunk shape of `(64, 64, 64)`.
+
 ### Writing with Tensorstore
 
 To write with [tensorstore], which may provide better performance, use the
@@ -243,4 +287,5 @@ to_ngff_zarr('cthead1_zarr2.ome.zarr', multiscales, version='0.4')
 [`to_ngff_image`]: ./apidocs/ngff_zarr/ngff_zarr.to_ngff_image.md
 [`to_multiscales`]: ./apidocs/ngff_zarr/ngff_zarr.to_multiscales.md
 [`from_ngff_zarr`]: ./apidocs/ngff_zarr/ngff_zarr.from_ngff_zarr.md
+[Sharded Zarr]: https://zarr.dev/zeps/accepted/ZEP0002.html
 [tensorstore]: https://google.github.io/tensorstore/