Skip to content

Redefine Virtual Readers as func(filepath) -> ManifestStore #498

@TomNicholas

Description

@TomNicholas

Once we've completed #473, the responsibility of each reader will essentially be reduced to just creating a fully virtual ManifestStore instance from the given filepath.

This means that we could redefine what a "virtual reader" is, as simply a filetype-specific function of the form

def virtual_reader(filepath: str) -> ManifestStore:
    ...

where there is one of these readers for TIFF files, one for HDF files, etc. etc.

As described in #473, loadable_variables is then not ever even passed to the reader function, as instead it's always handled by virtualizarr in some subsequent code like this:

ds_virtual = manifeststore.to_virtual_ds()

ds_loadable = xr.open_zarr(manifeststore, drop_variables=list(ds_virtual.variables))

vds = xr.merge(ds_virtual, ds_loadable)

This is mostly great, BUT it does cause some issues for the special set of readers who read from other virtual references formats (i.e. DMR++ and Kerchunk) - see #477 (comment).

Another issue it causes is that it becomes tricky to solve #489 - when reading inlined data the kerchunk reader would prefer to return a loadable variable, but this manifeststore-first architecture doesn't really allow that. Unless we used the ability of obstore to wrap an MemoryStore too...?

cc @maxrjones

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions