Does Xarray do lazy reads when opening netcdf from s3? #6404
-
The documentation for netCDF says it reads data lazily from disk. Is this still valid for files residing in AWS S3? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
You should be able to do this: with fsspec.open("s3://bucket/file.netcdf", mode="rb") as f:
ds = xr.open_dataset(f, engine="h5netcdf")
# Do stuff with `ds` I had a case where I didn't want to use f = fsspec.open("s3://bucket/file.netcdf", mode="rb")
f = f.__enter__()
ds = xr.open_dataset(f, engine="h5netcdf") |
Beta Was this translation helpful? Give feedback.
-
When you read NetCDF4 files (which are HDF5 files with certain conventions) from S3 using Xarray, only the metadata and coordinate variables are loaded eagerly, while the data variables are loaded lazily, just as if the NetCDF4 file were on a local filesystem. Here's a Jupyter Notebook demonstrating opening a 25GB file from S3 in a few seconds, then reading data lazily. |
Beta Was this translation helpful? Give feedback.
When you read NetCDF4 files (which are HDF5 files with certain conventions) from S3 using Xarray, only the metadata and coordinate variables are loaded eagerly, while the data variables are loaded lazily, just as if the NetCDF4 file were on a local filesystem.
Here's a Jupyter Notebook demonstrating opening a 25GB file from S3 in a few seconds, then reading data lazily.