Skip to content

Updates for Zarr 3 Dtypes #10456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Updates for Zarr 3 Dtypes #10456

wants to merge 18 commits into from

Conversation

ianhi
Copy link
Contributor

@ianhi ianhi commented Jun 27, 2025

Updates for Zarr Dtypes. Currently most functional, but still debugging the case of Zarr format 2.

Important Changes so far:

  1. Removed a test expected append behaviors to work with mismatched string lengths. This should not work as it can lead to data loss
  2. Updated interaction with dtypes around fill value encoding/decoding to account for the new zarr dtype objects

@ianhi ianhi marked this pull request as draft June 27, 2025 20:29
@github-actions github-actions bot added topic-backends topic-documentation topic-zarr Related to zarr storage library CI Continuous Integration tools Automation Github bots, testing workflows, release automation io labels Jun 27, 2025
@TomNicholas TomNicholas added the run-upstream Run upstream CI label Jul 1, 2025
@ianhi ianhi force-pushed the zarr-dtype-tests branch from 99dc235 to b0e70c6 Compare July 1, 2025 15:34
@ianhi ianhi force-pushed the zarr-dtype-tests branch 2 times, most recently from 8e40e73 to 3877091 Compare July 3, 2025 15:47
@ianhi
Copy link
Contributor Author

ianhi commented Jul 3, 2025

heads up @d-v-b I think we are workign in parallel? I saw a commit on your equivalent pr

@d-v-b
Copy link
Contributor

d-v-b commented Jul 3, 2025

I'm not making any changes over in that PR, it's just for tracking the test failures between upstream zarr python and xarray main

@ianhi ianhi closed this Jul 3, 2025
@ianhi ianhi reopened this Jul 3, 2025
@ianhi
Copy link
Contributor Author

ianhi commented Jul 3, 2025

@d-v-b I think everything passes now with the recent zarr fixes and the small fill value interaction in this PR.

I have a little bit of test clean up to do. And I can't get the upsstream tests to run which iss annoying, but i think we're good!

@d-v-b
Copy link
Contributor

d-v-b commented Jul 3, 2025

which tests in CI are running with the main version of zarr?

ianhi added 3 commits July 3, 2025 14:11
main branch of zarr is not a higher version right now.
@@ -23,8 +23,7 @@ jobs:
name: detect upstream-dev ci trigger
runs-on: ubuntu-latest
if: |
github.repository == 'pydata/xarray'
&& (github.event_name == 'push' || github.event_name == 'pull_request')
github.event_name == 'push' || github.event_name == 'pull_request'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An attempt to get upstream tests to run on this PR

# probably be 3.1

# if Version(zarr.__version__) > Version("3.0.10"):
if hasattr(zarr_array.metadata.data_type, "to_native_dtype"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to check by attrs here because of version confusion. Though maybe this is a better check here anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better check than version comparison*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this check ever be used for arrays using Zarr 2.x? In 2.x didn't have the .metadata attr.

In [4]: a
Out[4]: <zarr.core.Array (2,) int64>

In [5]: a.metadata._data_type
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[5], line 1
----> 1 a.metadata._data_type

AttributeError: 'Array' object has no attribute 'metadata'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good callout, I'm a little surprised the tests didn't pick that up.

For merging we can do >=3.1.0 i think, but I ran into problems testing with version checking because the release of 3.0.10 is higher than 3.0.9.dev (and for some reason the instlalation from git was not turning it into 3.0.10.dev)

@ianhi ianhi marked this pull request as ready for review July 3, 2025 18:16

# if Version(zarr.__version__) > Version("3.0.10"):
if hasattr(zarr_array.metadata.data_type, "to_native_dtype"):
native_dtype = zarr_array.metadata.data_type.to_native_dtype()
Copy link
Contributor

@d-v-b d-v-b Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zarr_array.dtype is still a regular numpy dtype after my changes. so I don't think you need to go through the metadata

)
)
else:
dtype_value = zarr_array.metadata.data_type.value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here -- I don't think you need to go through the metadata to get the numpy dtype

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I'm confused because I thought I fixed this already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had this since my initial test, never thought to try to remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when was the last time you pulled in the latest changes from main? because my fix has been in there for a while

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this PR started by overwriting my fix :) I would recommend reverting that change

@@ -816,11 +816,29 @@ def open_store_variable(self, name):
if zarr_array.fill_value is not None:
attributes["_FillValue"] = zarr_array.fill_value
elif "_FillValue" in attributes:
attributes["_FillValue"] = FillValueCoder.decode(
attributes["_FillValue"], zarr_array.dtype
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is correct and should be restored

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Automation Github bots, testing workflows, release automation CI Continuous Integration tools io run-upstream Run upstream CI topic-backends topic-documentation topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zarr v3 Dtype refactor changes needed in Xarray
5 participants