-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
From: #2
- Previous work, including HDMF, CORAL, the NeXus data format for X-ray and neutron scattering and muon spectroscopy, netCDF?, OME-NGFF, geospatial data?, JSON schema arrays, previous approaches in LinkML
- Many file formats for arrays - binary file, numpy npy/npz, HDF5, Zarr, N5, JSON, CSV/TSV, grib, tiff, fits
- Many APIs for working with these formats - numpy, h5py, zarr, xarray, etc.
Along with https://github.com/orgs/linkml/discussions/2020#discussioncomment-9161935 , we want to differentiate what we're doing here from the above approaches.
linkml-arrays is a markup and schema format for specifying arrays that is compatible with many backends and formats.
Goals:
- authorable: Can be written by hand, but also compatible with authoring tools that can autogenerate a schema descriptor from a concrete array.
- portable: in YAML, usable from many different tools.
- decoupled from serialization: Makes no assumptions about how the array is represented on disk or in memory.
- metadata-enriched: linked data annotations for arrays to eg. tag with specific units, types, and semantics
- nonprescriptive/generic: descriptions are arbitrary, don't say how arrays should be represented, but allows downstream schema/formats/etc. like NWB et al. to make those prescriptions - a unification layer that allows those negotiations to happen.
- ..?
Not goals:
- API: not a representation of how arrays should be used/capabilities they should have in programming environments. Arrays just need to be able to have a type and a shape.
- ..?