Skip to content

Commit 12d00f8

Browse files
committed
feat(from_ngff_zarr): add support for storage_options
More convenient to deal with s3, etc. auth. And corresponding Network Storage and Authentication FAQ entry.
1 parent 7fb9609 commit 12d00f8

File tree

4 files changed

+1179
-988
lines changed

4 files changed

+1179
-988
lines changed

docs/faq.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,91 @@ with very large datasets!
2828
The lazy evaluation approach allows ngff-zarr to handle extremely large datasets
2929
that wouldn't fit in memory, while still providing optimal performance through
3030
intelligent task graph optimization.
31+
32+
## Network Storage and Authentication
33+
34+
### How do I read from network stores like S3, GCS, or other remote storage?
35+
36+
ngff-zarr can read from any network store that provides a Zarr Python compatible
37+
interface. This includes stores from
38+
[fsspec](https://filesystem-spec.readthedocs.io/en/latest/), which supports many
39+
protocols including S3, Google Cloud Storage, Azure Blob Storage, and more.
40+
41+
You can construct network stores with authentication options and pass them
42+
directly to ngff-zarr functions.
43+
44+
The following examples require the `fsspec` and `s3fs` packages:
45+
46+
```bash
47+
pip install fsspec s3fs
48+
```
49+
50+
```python
51+
import zarr
52+
from ngff_zarr import from_ngff_zarr
53+
54+
# S3 example with authentication using FsspecStore
55+
s3_store = zarr.storage.FsspecStore.from_url(
56+
"s3://my-bucket/my-dataset.zarr",
57+
storage_options={
58+
"key": "your-access-key",
59+
"secret": "your-secret-key",
60+
"region_name": "us-west-2"
61+
}
62+
)
63+
64+
# Read from the S3 store
65+
multiscales = from_ngff_zarr(s3_store)
66+
```
67+
68+
For public datasets, you can omit authentication:
69+
70+
```python
71+
# Example using OME-Zarr Open Science Vis Datasets
72+
s3_store = zarr.storage.FsspecStore.from_url(
73+
"s3://ome-zarr-scivis/v0.5/96x2/carp.ome.zarr",
74+
storage_options={"anon": True} # Anonymous access for public data
75+
)
76+
77+
multiscales = from_ngff_zarr(s3_store)
78+
```
79+
80+
You can also pass S3 URLs directly to ngff-zarr functions, which will create the
81+
appropriate store automatically:
82+
83+
```python
84+
# Direct URL access for public datasets
85+
multiscales = from_ngff_zarr(
86+
"s3://ome-zarr-scivis/v0.5/96x2/carp.ome.zarr",
87+
storage_options={"anon": True}
88+
)
89+
```
90+
91+
For more control over the underlying filesystem, you can use S3FileSystem
92+
directly:
93+
94+
```python
95+
import zarr
96+
from s3fs import S3FileSystem
97+
98+
# Using S3FileSystem with Zarr
99+
fs = S3FileSystem(
100+
key="your-access-key",
101+
secret="your-secret-key",
102+
region_name="us-west-2"
103+
)
104+
store = zarr.storage.FsspecStore(fs=fs, path="my-bucket/my-dataset.zarr")
105+
106+
multiscales = from_ngff_zarr(store)
107+
```
108+
109+
**Authentication Options:**
110+
111+
- **Environment variables**: Set `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`,
112+
etc.
113+
- **IAM roles**: Use EC2 instance profiles or assume roles
114+
- **Configuration files**: Use `~/.aws/credentials` or similar
115+
- **Direct parameters**: Pass credentials directly to the store constructor
116+
117+
The same patterns work for other cloud providers (GCS, Azure) by using their
118+
respective fsspec implementations (e.g., `gcsfs`, `adlfs`).

py/ngff_zarr/from_ngff_zarr.py

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,26 +35,46 @@ def from_ngff_zarr(
3535
store: StoreLike,
3636
validate: bool = False,
3737
version: Optional[str] = None,
38+
storage_options: Optional[dict] = None,
3839
) -> Multiscales:
3940
"""
4041
Read an OME-Zarr NGFF Multiscales data structure from a Zarr store.
4142
4243
store : StoreLike
43-
Store or path to directory in file system.
44+
Store or path to directory in file system. Can be a string URL
45+
(e.g., 's3://bucket/path') for remote storage.
4446
4547
validate : bool
4648
If True, validate the NGFF metadata against the schema.
4749
4850
version : string, optional
4951
OME-Zarr version, if known.
5052
53+
storage_options : dict, optional
54+
Storage options to pass to the store if store is a string URL.
55+
For S3 URLs, this can include authentication credentials and other
56+
options for the underlying filesystem.
57+
5158
Returns
5259
-------
5360
5461
multiscales: multiscale ngff image with dask-chunked arrays for data
5562
5663
"""
5764

65+
# Handle string URLs with storage options (zarr-python 3+ only)
66+
if isinstance(store, str) and storage_options is not None:
67+
if store.startswith(("s3://", "gs://", "azure://", "http://", "https://")):
68+
if zarr_version_major >= 3 and hasattr(zarr.storage, "FsspecStore"):
69+
store = zarr.storage.FsspecStore.from_url(
70+
store, storage_options=storage_options
71+
)
72+
else:
73+
raise RuntimeError(
74+
"storage_options parameter requires zarr-python 3+ with FsspecStore support. "
75+
f"Current zarr version: {zarr.__version__}"
76+
)
77+
5878
format_kwargs = {}
5979
if version and zarr_version_major >= 3:
6080
format_kwargs = (

0 commit comments

Comments
 (0)