Skip to content

Re-implement loadable_variables using ManifestStore #473

@TomNicholas

Description

@TomNicholas

#124 describes @ayushnag's idea to allow loading data variables directly from the in-memory ManifestArrays, rather than having to write to kerchunk/icechunk then reading from that. PR #458 will add a zarr-compliant in-memory virtual ManifestStore that wraps a virtual dataset and would allow loading data from it via

virtual_ds = vz.open_virtual_dataset(filepath)
manifeststore = vz.ManifestStore(virtual_ds)
lazy_ds = xr.open_zarr(manifeststore)
loaded_ds = ds.load()

This issue is to track the idea that once #458 is implemented we should refactor the implementation of loadable_variables to use this ManifestStore + xr.open_zarr approach internally, for all backends. Currently this is instead done by each virtual backend calling out to a different xarray backend, depending on the filetype.

There are multiple reasons to re-implement this:

  1. We would no longer need every virtual backend to have a corresponding xarray backend,
  2. We would be able to guarantee (and create property-based tests - see Refactor tests around expected properties #394) that loading data via loadable_variables will give the same result as creating a virtual dataset, writing to icechunk, then loading,
  3. Make it easier entralize file handle management, so we can close file handles in the way xarray can (see File handle resource leak #468).

FYI @chuckwondo

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions