-
Notifications
You must be signed in to change notification settings - Fork 45
Closed
Labels
Milestone
Description
#124 describes @ayushnag's idea to allow loading data variables directly from the in-memory ManifestArrays
, rather than having to write to kerchunk/icechunk then reading from that. PR #458 will add a zarr-compliant in-memory virtual ManifestStore
that wraps a virtual dataset and would allow loading data from it via
virtual_ds = vz.open_virtual_dataset(filepath)
manifeststore = vz.ManifestStore(virtual_ds)
lazy_ds = xr.open_zarr(manifeststore)
loaded_ds = ds.load()
This issue is to track the idea that once #458 is implemented we should refactor the implementation of loadable_variables
to use this ManifestStore
+ xr.open_zarr
approach internally, for all backends. Currently this is instead done by each virtual backend calling out to a different xarray backend, depending on the filetype.
There are multiple reasons to re-implement this:
- We would no longer need every virtual backend to have a corresponding xarray backend,
- We would be able to guarantee (and create property-based tests - see Refactor tests around expected properties #394) that loading data via
loadable_variables
will give the same result as creating a virtual dataset, writing to icechunk, then loading, - Make it easier entralize file handle management, so we can close file handles in the way xarray can (see File handle resource leak #468).
FYI @chuckwondo