-
Dear all, I am working with a large dataset from a Fortran Navier-Stokes solver that stores its results in a series of binary files. In order the make the analysis of that data easier, I have managed to create a new backend for xarray, following the documentation How to add a new backend. It basically uses numpy.fromfile to load the data from the disc and then wraps it into a It works pretty well, but I am wondering how to take it to the next level and make it support lazy loading since the datasets are often larger than the memory. Would it be possible for this kind of data? Maybe including dask somewhere? Any ideas, comments, and suggestions are welcome, thanks in advance. import xarray as xr
import numpy as np
from xarray.backends import BackendEntrypoint
# Binary files do not hold information about the
# dimensions, coordinates, shape and type.
# So we provide them manually for this example
coords = dict(x=range(10), y=range(5))
shape = [len(i) for i in coords.values()]
dtype = np.float64
# Create a test dataset
ds_out = xr.DataArray(dtype(0.0), dims=coords.keys(), coords=coords).to_dataset(
name="foo"
)
# Save it to the disc as a binary file
# in the same format that the solver would do,
# so our test is representative
ds_out["foo"].values.tofile("test.bin")
# Create BackendEntrypoint following the documentation
# We are basically loading the data with np.fromfile
# and wrapping it into a dataset
class MyBackendEntrypoint(BackendEntrypoint):
def open_dataset(
filename_or_obj,
name,
drop_variables=None,
):
return xr.DataArray(
# Note np.fromfile here
data=np.fromfile(filename_or_obj, dtype=dtype).reshape(shape),
dims=coords.keys(),
coords=coords,
).to_dataset(name=name)
# Now we use the custom engine to load our test case
ds_in = xr.open_dataset("test.bin", engine=MyBackendEntrypoint, name="foo")
# And assert that everithing went well
xr.testing.assert_equal(ds_out, ds_in) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@aurghs addressed this in her Dask Summit presentation. See https://github.com/aurghs/xarray-backend-tutorial/blob/main/2.Backend_with_Lazy_Loading.ipynb |
Beta Was this translation helpful? Give feedback.
@aurghs addressed this in her Dask Summit presentation.
See https://github.com/aurghs/xarray-backend-tutorial/blob/main/2.Backend_with_Lazy_Loading.ipynb