Skip to content

Background - differentiation from other approaches #3

@sneakers-the-rat

Description

@sneakers-the-rat

From: #2

  • Previous work, including HDMF, CORAL, the NeXus data format for X-ray and neutron scattering and muon spectroscopy, netCDF?, OME-NGFF, geospatial data?, JSON schema arrays, previous approaches in LinkML
  • Many file formats for arrays - binary file, numpy npy/npz, HDF5, Zarr, N5, JSON, CSV/TSV, grib, tiff, fits
  • Many APIs for working with these formats - numpy, h5py, zarr, xarray, etc.

Along with https://github.com/orgs/linkml/discussions/2020#discussioncomment-9161935 , we want to differentiate what we're doing here from the above approaches.

linkml-arrays is a markup and schema format for specifying arrays that is compatible with many backends and formats.

Goals:

  • authorable: Can be written by hand, but also compatible with authoring tools that can autogenerate a schema descriptor from a concrete array.
  • portable: in YAML, usable from many different tools.
  • decoupled from serialization: Makes no assumptions about how the array is represented on disk or in memory.
  • metadata-enriched: linked data annotations for arrays to eg. tag with specific units, types, and semantics
  • nonprescriptive/generic: descriptions are arbitrary, don't say how arrays should be represented, but allows downstream schema/formats/etc. like NWB et al. to make those prescriptions - a unification layer that allows those negotiations to happen.
  • ..?

Not goals:

  • API: not a representation of how arrays should be used/capabilities they should have in programming environments. Arrays just need to be able to have a type and a shape.
  • ..?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions