Skip to content

ZEP9 (phase 1): add clarifications for extension naming #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 67 commits into from
Apr 17, 2025

Conversation

joshmoore
Copy link
Member

@joshmoore joshmoore commented Feb 14, 2025

This PR clarifies the extension mechanism concept in the v3 specification. Comments on any changes which will break existing implementations are STRONGLY encouraged. Please see zarr-developers/zeps#65 for background material.

TODOs:

  • clarify the file numbering (currently 3.0.rst)
  • move definitions to the appropriate location (core, subtype page or ext

Post-merge:

@rabernat
Copy link
Contributor

@joshmoore - really glad you got this started! 🙌

My feedback is that the PR is hard to review. It touches 15 files, including a ton of minor, unrelated formatting changes to the core spec document.

If we want folks to engage and give meaningful feedback, we need to make it easier to review. I'd recommend starting fresh with a minimal PR in which the diffs are reflective exclusively of the actual proposed changes.

@joshmoore
Copy link
Member Author

@rabernat
really glad you got this started! 🙌

👍

It touches 15 files

You're right. I've extracted out #331.

including a ton of minor, unrelated formatting changes to the core spec document.

I disagree that they are unrelated. Take a look. The sections I've modified were basically already un-parseable. Since I was adding sections, the outline was getting more convoluted.

I'd recommend starting fresh with a minimal PR in which the diffs are reflective exclusively of the actual proposed changes.

👍 Give it a look and let me know what you think.

@jbms
Copy link
Contributor

jbms commented Feb 17, 2025

Thanks for all of your work on this!

My current understanding of the practical effect of proposal is as follows:

-raw names will be granted fairly easily, e.g. zstd, bfloat16, and others I've proposed would be assigned to me, the ones that zarr-python has started using (string, bytes, vlen-utf8, etc.) would be assigned to someone from zarr-python. URL names will be used only for really experimental stuff, all commonly-used extensions will have raw names since they will be minimal effort. Therefore, the verbosity of the URLs is not really a problem in practice.

  • the ZEP process, or really any mandatory review process at all, will not be used for proposing extensions that fit into any of the existing extension points, only for entirely new extension points. At most someone might ask around for comments informally before adopting something.

The lack of basically any review worries me a bit. But ultimately I'm in favor of this proposal because I think it reflects the reality that the ZEP process isn't working for the existing extension points, and it would be better to just rely on a less formal process.

@joshmoore joshmoore mentioned this pull request Feb 18, 2025
@normanrz
Copy link
Member

The lack of basically any review worries me a bit. But ultimately I'm in favor of this proposal because I think it reflects the reality that the ZEP process isn't working for the existing extension points, and it would be better to just rely on a less formal process.

I share your concerns to some degree. I think we can adapt the governance structure for extensions in the future, if we think that a more thorough review process would be necessary. We are thinking of forming a zarr specs team that could take on that responsibility.

@indexzero
Copy link

Howdy all 👋 – long time listener, first time caller. I'm an early Node.js contributor (emeritus) & served on the Board of Directors for the Node.js Foundation / OpenJS Foundation for seven years.

I wanted to share my thoughts from working on distributed open source projects like node, as well as time working with standards bodies like ECMA TC-39. I wanted my thoughts here to be sincere, but I realized getting ready to post this that it is ... long 😅

As such, I added a "TL;DR" below 💯

"Good artists copy, great artists steal" – Picasso

I'll open by saying that in my own experience the wisdom @d-v-b shared here is right 80%-85% of the time:

Just to emphasize, I want a boring, simple solution here. Our default solution should be to copy something that has worked in a similar project. If you are rejecting something that has worked for another project, then I would like to see an engineering-based explanation for that decision.

TL;DR?

For the "TL;DR" crowd, every suggestion below comes with lived experience (and what I hope is in turn wisdom) from past work on popular/distributed open source projects & specifications:

  1. Implementing a specification will influence its final form: check out the TC-39 staging process and consider drafting a staging process for extensions INSTEAD of worrying too much about how they get registered.
    • Bonus points: ensure at least one Zarr implementation exists for an extension before reaching Stage N s.t. N > 1
  2. Ecosystems thrive when "less is more" the correctness of spec text should be considered on equal footing with how much cognitive overhead is needed for the "happy path". In this context:

Implementing a specification will influence its final form

As mentioned by @jbms (see: 1 2), @normanrz (see: 3), there appear to be material concerns about the current ZEP process as it applies to extensions. I would strongly urge this group to think about and come to a lazy consensus on how to surface to your community – NOT enforce – how "stable" or "mature" a given extension is.

Why?: specs that are unimplementable and/or implemented inconsistently are the hardest kind of situation to undo.

e.g. ESModules was approved in the ES2015 version of ECMA-262 (aka JavaScript), but required other specifications to be finalized (e.g. WHATWG Module Loader spec) prior to implementation being possible. That was almost TEN YEARS AGO 😳 and the JS ecosystem is still helping the average developer wrap their heads around years of "CJS vs ESM" FUD.

After ES2015 shipped, similar process challenges were codified by TC-39 into the current staging process document which now requires that in order to get to Stage 4 a new proposal must be implemented in two runtimes.

Investing in a lightweight set of stages that represent MVG (minimum viable guardrails) will pay a much larger dividend than you might think 💡

When it comes to fostering a vibrant ecosystem, "less is more"

whenyoudothingsright

There is an old proverb that I'm sure many (if not all of you) have heard before:

If a tree falls in the woods and no one is around to hear it, does it make a sound?

There is a version of that wisdom for Zarr extensions that I pose as a thought experiment:

If a Zarr extension is added in zarr-developers/zarr-extensions and no Zarr libaries implement it, is it an extension?

I posit this not to undermine the technical merits of any one approach over the other, but to bring to the surface that software is written by people. In fact, this kind of confusion was highlighted in the review of this proposal

@d-v-b's comment (4) about ensuring global interoperability a prior rang true to me:

I think it would actually be simpler to allow extension authors to have control over how interoperable they want their extensions to be. In situations where interoperability is not a priority (e.g., developing new stuff, or using something within an org), then extension authors could use whatever extension names are convenient for them.

For an example why telling your users – or library implementors – EXACTLY how the world MUST be because you are trying to protect them from making mistakes look no further than peerDependencies in npm@2. TL;DR was that:

  • npm@2: an unsatisfiable set of semver expressions in peerDependencies of the packages you depend on WOULD NOT INSTALL
  • npm >= 3: an unsatisfiable set of semver expressions in peerDependencies of the packages you depend on DISPLAYED A WARNING

The goal in the original implementation was to ensure correctness. But it turns out that correctness didn't matter all that much for most users. In this context: making decisions based on what other users "might" be doing will generally lead to pain.

jakirkham
jakirkham previously approved these changes Apr 17, 2025
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Josh for spearheading this revision! ❤️

Also thanks to everyone on the ZSC who participated in the discussion and early drafts of this work 🙏

Think this provides a good path for developers to explore and expand the use cases Zarr can handle. With the work Norman has done on the first extensions, we can set an example future contributors can follow

Co-authored-by: jakirkham <jakirkham@gmail.com>
@joshmoore joshmoore dismissed stale reviews from jakirkham, alimanfoo, jbms, normanrz, and rabernat via 7ee6d31 April 17, 2025 11:57
@joshmoore
Copy link
Member Author

joshmoore commented Apr 17, 2025

Thanks, @jakirkham. With that approval, we now have unanimous support from @zarr-developers/steering-council and no major objections since the call on April 3rd.

I'll close all remaining comments above (per current GH settings), merge (and tag 3.1 -- this repo isn't tagged). I'd encourage interested reviewers to turn their attention to https://github.com/zarr-developers/zarr-extensions/pulls?q=sort%3Aupdated-desc+is%3Apr+is%3Aopen

Two final points:

  • Apologies if it seemed like I ignored the zarr. prefix point, @mkitti. Somewhere above in the expanses, I stated how for this PR with its intent of being nonbreaking that's not an option because of the current state of prefixes in the community. We can certainly use the review process on zarr-extensions, though, to keep a prefix clear as those proposals start to appear.

  • Thanks for taking the time for the write up, @indexzero! 🙏🏽 What you say resonates, especially the cost of fixing specs post hoc (see above) and the value of having implementations (staging process, etc.). This first step is about removing the requirement to have a PR against this repository in order to create an extension. I think we've achieved that in a fairly minimal manner here, but there is clearly a ways to go, and I imagine your advice will be hugely useful as we approach the question of making more significant changes.

@joshmoore joshmoore merged commit 1d0989e into zarr-developers:main Apr 17, 2025
1 check passed
@joshmoore joshmoore deleted the zep9-ext-naming branch April 17, 2025 13:39
@mkitti
Copy link
Contributor

mkitti commented Apr 17, 2025

Somewhere above in the expanses, I stated how for this PR with its intent of being nonbreaking that's not an option because of the current state of prefixes in the

To clarify, what I mean overall is that we should reserve characters for later potential namespace or unregistered names in lieu of actually specifying that mechanism. That is we should reserve characters. I propose we at least copy the reservations of RFC 3986:

reserved = gen-delims / sub-delims

gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" 

Additionally, we should also reserve -, ., and ~.

This would allow us to adopt some kind of URI or namespace mechanism in the future if we so choose.

The propsective idea of a default zarr. prefix namespace would allow us to disambiguate these centrally registered extensions from potentially foreign registered extensions. This is meant to be backwards compatible because the unqualified lz4 would implicily be thought of as being zarr.lz4. This could be used to distinguish lz4 as implemented in Python numcodecs or numcodecs.js from the HDF5's incompatible lz4 codec in use with Zarr by Unidata's NetCDF, which we might call hdf5.lz4 in the future. The decision on the default namespace should be deferred, but we should make provisions so that we could adopt such a scheme in the future.

My proposition today is that we reserve . and other characters going forward except in the context of the grandfathered URI names. This would allow us to consider a variety of namespace and extension schemes in the future.

@mkitti
Copy link
Contributor

mkitti commented Apr 17, 2025

I turned my comment into a pull request on zarr-extensions:
zarr-developers/zarr-extensions#7

joshmoore added a commit to joshmoore/zeps that referenced this pull request May 8, 2025
This PR reduces the scope of ZEP0009 to match the decisions
made in zarr-developers/zarr-specs#330
and subsequently published as the 3.1 update of the Zarr
specification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.