Skip to content

Updates for Zarr 3 Dtypes #10456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 10, 2025
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/upstream-dev-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,7 @@ jobs:
name: detect upstream-dev ci trigger
runs-on: ubuntu-latest
if: |
github.repository == 'pydata/xarray'
&& (github.event_name == 'push' || github.event_name == 'pull_request')
github.event_name == 'push' || github.event_name == 'pull_request'
outputs:
triggered: ${{ steps.detect-trigger.outputs.trigger-found }}
steps:
Expand Down
26 changes: 22 additions & 4 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -816,11 +816,29 @@ def open_store_variable(self, name):
if zarr_array.fill_value is not None:
attributes["_FillValue"] = zarr_array.fill_value
elif "_FillValue" in attributes:
attributes["_FillValue"] = FillValueCoder.decode(
attributes["_FillValue"], zarr_array.dtype
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is correct and should be restored

# TODO update version check for the released version with dtypes
# probably be 3.1

# if Version(zarr.__version__) > Version("3.0.10"):
if hasattr(zarr_array.metadata.data_type, "to_native_dtype"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to check by attrs here because of version confusion. Though maybe this is a better check here anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better check than version comparison*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this check ever be used for arrays using Zarr 2.x? In 2.x didn't have the .metadata attr.

In [4]: a
Out[4]: <zarr.core.Array (2,) int64>

In [5]: a.metadata._data_type
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[5], line 1
----> 1 a.metadata._data_type

AttributeError: 'Array' object has no attribute 'metadata'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good callout, I'm a little surprised the tests didn't pick that up.

For merging we can do >=3.1.0 i think, but I ran into problems testing with version checking because the release of 3.0.10 is higher than 3.0.9.dev (and for some reason the instlalation from git was not turning it into 3.0.10.dev)

native_dtype = zarr_array.metadata.data_type.to_native_dtype()
Copy link
Contributor

@d-v-b d-v-b Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zarr_array.dtype is still a regular numpy dtype after my changes. so I don't think you need to go through the metadata

attributes["_FillValue"] = (
# Use the new dtype infrastructure instead of doing xarray
# specific fill value decoding
FillValueCoder.decode(
attributes["_FillValue"],
native_dtype,
)
)
else:
dtype_value = zarr_array.metadata.data_type.value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here -- I don't think you need to go through the metadata to get the numpy dtype

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I'm confused because I thought I fixed this already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had this since my initial test, never thought to try to remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when was the last time you pulled in the latest changes from main? because my fix has been in there for a while

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this PR started by overwriting my fix :) I would recommend reverting that change

attributes["_FillValue"] = FillValueCoder.decode(
attributes["_FillValue"], dtype_value
)

variable = Variable(dimensions, data, attributes, encoding)

return Variable(dimensions, data, attributes, encoding)
return variable

def get_variables(self):
return FrozenDict((k, self.open_store_variable(k)) for k in self.array_keys())
Expand Down
14 changes: 14 additions & 0 deletions xarray/tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,20 @@ def _importorskip(
has_rasterio, requires_rasterio = _importorskip("rasterio")
has_zarr, requires_zarr = _importorskip("zarr")
has_zarr_v3, requires_zarr_v3 = _importorskip("zarr", "3.0.0")
has_zarr_v3_dtypes, requires_zarr_v3_dtypes = _importorskip("zarr", "3.0.10")
if has_zarr_v3:
import zarr

# manual update by checking attrs for now
# TODO: use version specifier
# installing from git main is giving me a lower version than the
# most recently released zarr
has_zarr_v3_dtypes = hasattr(zarr.core, "dtype")

requires_zarr_v3_dtypes = pytest.mark.skipif(
not has_zarr_v3_dtypes, reason="requires zarr>3.0.10 (including dev versions)"
)

has_fsspec, requires_fsspec = _importorskip("fsspec")
has_iris, requires_iris = _importorskip("iris")
has_numbagg, requires_numbagg = _importorskip("numbagg")
Expand Down
15 changes: 13 additions & 2 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
has_scipy,
has_zarr,
has_zarr_v3,
has_zarr_v3_dtypes,
mock,
network,
requires_cftime,
Expand Down Expand Up @@ -2416,7 +2417,7 @@ def test_read_non_consolidated_warning(self) -> None:
def test_non_existent_store(self) -> None:
with pytest.raises(
FileNotFoundError,
match="(No such file or directory|Unable to find group|No group found)",
match="(No such file or directory|Unable to find group|No group found in store)",
):
xr.open_zarr(f"{uuid.uuid4()}")

Expand Down Expand Up @@ -2498,6 +2499,7 @@ def test_manual_chunk(self) -> None:
assert_identical(actual.load(), auto.load())

@requires_dask
@pytest.mark.filterwarnings("ignore:.*does not have a Zarr V3 specification.*")
def test_warning_on_bad_chunks(self) -> None:
original = create_test_data().chunk({"dim1": 4, "dim2": 3, "dim3": 3})

Expand Down Expand Up @@ -2906,7 +2908,12 @@ def test_append_with_existing_encoding_raises(self) -> None:

@pytest.mark.parametrize("dtype", ["U", "S"])
def test_append_string_length_mismatch_raises(self, dtype) -> None:
skip_if_zarr_format_3("This actually works fine with Zarr format 3")
print("start")
print(has_zarr_v3)
print(has_zarr_v3_dtypes)
if has_zarr_v3 and not has_zarr_v3_dtypes:
skip_if_zarr_format_3("This actually works fine with Zarr format 3")

ds, ds_to_append = create_append_string_length_mismatch_test_data(dtype)
with self.create_zarr_target() as store_target:
ds.to_zarr(store_target, mode="w", **self.version_kwargs)
Expand All @@ -2919,8 +2926,12 @@ def test_append_string_length_mismatch_raises(self, dtype) -> None:
def test_append_string_length_mismatch_works(self, dtype) -> None:
skip_if_zarr_format_2("This doesn't work with Zarr format 2")
# ...but it probably would if we used object dtype
if has_zarr_v3_dtypes:
pytest.skip("This works on pre ZDtype Zarr-Python, but fails after.")

ds, ds_to_append = create_append_string_length_mismatch_test_data(dtype)
expected = xr.concat([ds, ds_to_append], dim="time")

with self.create_zarr_target() as store_target:
ds.to_zarr(store_target, mode="w", **self.version_kwargs)
ds_to_append.to_zarr(store_target, append_dim="time", **self.version_kwargs)
Expand Down
Loading