-
Notifications
You must be signed in to change notification settings - Fork 33
Description
the v3 spec states that codecs are stored in a JSON array under the key codecs
. But the spec also states that the list of codecs is structured:
...the list of codecs must be of the following form:
zero or more array -> array codecs; followed by
exactly one array -> bytes codec; followed by
zero or more bytes -> bytes codecs.
This is actually a lot of semantic load for something simple like a JSON array. Instead of using a JSON array, I believe that the above structure could be expressed much better (where "expressed better" means "conveys intent more clearly, with no loss of information, and minimal added complexity") by using a JSON object with the following structure:
{
"codecs": {
"array_array": [], # array of array -> array codecs, possibly empty
"array_bytes": {"name": "bytes"}, # single of array -> bytes codec, required
"bytes_bytes": [], # array of bytes -> bytes codecs, possibly empty
}
I am noting this because over in the zarr-python
v3 implementation effort, we have written something like the above data structure as part of the basic parsing of the contents of zarr.json
. In fact I think this data structure will arise in any implementation, because implementations must represent the structure of the codecs, and that structure is not captured at all by the JSON array representation. But, as I show here, it is trivial to describe the codec structure explicitly with JSON. A corollary benefit is that the above proposed data structure expresses much better the constraint that there be just 1 array -> bytes codec, which would reduce some validation burden from implementations.
So, if we care about making this easier for implementations (and I think making it easy for implementations also makes it easier for users), we should considering this change to zarr.json
. There is no change to the semantics of the spec, but it makes zarr.json
more clear. I understand that people may not want to change the spec. But I consider that a separate question from whether the current spec has defects that could in principle be fixed, such as the one described here.