Division between array serialization and specification

I've said this a few times when we've talked on zoom during the hackathons, so I don't mean to be a broken record, but one of the places that a lot of prior schema languages have messed up array specification is taking on too much of the weight of specifying the actual encoding of the arrays, rather than being a schematic description that is generic across serializations. 

The generality of the current form is pretty good! one way that I see us buying more complexity than we need to though is in this `GroupingByArrayOrder` idea: 
https://github.com/linkml/linkml-model/blob/aab9842be0e230c0040688dfc6ffa26696c97827/linkml_model/model/schema/array.yaml#L67-L94

That's an implementation detail of how arrays are stored and indexed - I don't think we should touch the storage part in the schema, and the indexing part is handled by the rest of the array specification, right? I could be missing something that requires that to be specified in the schema, but I think in general it would be good to make a clear separation of concerns here - a decent test is "can this array specification be satisfied in such a way that the schema knows absolutely nothing about the way that the array is serialized?" where the responsibility for getting the array ordering correct is that of the dumper/loader, similarly to how we would expect the dumper/loader to correctly handle chunking and other serialization details. 

This is actually what i want to work on at the hackashop - to work on a second set of specifications for declaring serializations, so in a linked data context one would be able to say "this particular array has n linked serializations - this numpy format, that zarr format, etc." without having that be specified in the array's schema. So a way of saying "this particular hash of a binary stream is annotated with being a numpy ndarray with shape (x,y)" and all the other details needed to handle the serialization/deserialization that could be consumed by a generalized dumper/loaders. So we may want to just talk about this next week :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Division between array serialization and specification #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Division between array serialization and specification #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions