Skip to content

improvements to parse_dtype #3264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jul 17, 2025

  • Add a new function parse_dtype. parse_data_type is kept around but it just wraps parse_dtype. The reason for this change is naming consistency -- the ZDType methods already use the "dtype" abbreviation extensively, so it's potentially confusing that parse_data_type does not.
  • Handle strings and sequences as potential json-like inputs. Adds tests to ensure that the JSON form a of a dtype is a valid argument to parse_dtype (with the exception of "|O", which is ambiguous).

closes #3263

…more JSON-like inputs, and test for round-trips
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jul 17, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jul 17, 2025
@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 17, 2025

cc @TomNicholas

@d-v-b d-v-b requested a review from a team July 17, 2025 14:23
@d-v-b d-v-b changed the title improvments to parse_dtype improvements to parse_dtype Jul 17, 2025
Copy link

codecov bot commented Jul 17, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 59.62%. Comparing base (0019733) to head (d684ada).

Files with missing lines Patch % Lines
src/zarr/core/dtype/__init__.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3264      +/-   ##
==========================================
+ Coverage   59.59%   59.62%   +0.03%     
==========================================
  Files          78       78              
  Lines        8696     8702       +6     
==========================================
+ Hits         5182     5189       +7     
+ Misses       3514     3513       -1     
Files with missing lines Coverage Δ
src/zarr/core/array.py 69.02% <100.00%> (ø)
src/zarr/dtype.py 0.00% <ø> (ø)
src/zarr/core/dtype/__init__.py 30.00% <75.00%> (+5.92%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 17, 2025

d684ada adds a test to ensure that parse_dtype is the same as parse_data_type

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - I like the name change. Having two identical functions in our API seems a bit confusing from a user POV: https://zarr--3264.org.readthedocs.build/en/3264/api/zarr/dtype/index.html#functions. Could you remove parse_data_type from __all__ so it's removed from the docs, but will still be imported and work for backwards compatibility?

Comment on lines +193 to +216
Interpret the input as a ZDType.

This function wraps ``parse_dtype``. The only difference is the function name. This function may
be deprecated in a future version of Zarr Python in favor of ``parse_dtype``.

Parameters
----------
dtype_spec : ZDTypeLike
The input to be interpreted as a ZDType. This could be a ZDType, which will be returned
directly, or a JSON representation of a ZDType, or a native dtype, or a python object that
can be converted into a native dtype.
zarr_format : ZarrFormat
The Zarr format version.

Returns
-------
ZDType[TBaseDType, TBaseScalar]
The ZDType corresponding to the input.

Examples
--------
>>> parse_dtype("int32", zarr_format=2)
Int32(endianness="little")
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say bin the docstirng here to avoid duplication, and just point to parse_dtype.

NullTerminatedBytes(length=10)
>>> parse_data_type({"name": "numpy.datetime64", "configuration": {"unit": "s", "scale_factor": 10}}, zarr_format=3)
DateTime64(endianness='little', scale_factor=10, unit='s')
>>> parse_dtype("int32", zarr_format=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, does this not need an import? not an issue, just a question

@@ -84,4 +85,5 @@
"data_type_registry",
"data_type_registry",
"parse_data_type",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"parse_data_type",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

incomplete round-tripping of v3 data type json
2 participants