How does xarray's file caching work exactly (when datasets are loaded with dask)? #6982
Unanswered
pjpetersik
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I encountered an unexpected behaviour regarding xarray datasets which are loaded into dask arrays.
When I open a dataset with
chunks="auto"
(load datasets into dask array), then delete the underlying NetCDF file and then execute.load()
, this does not lead to any erros and I can still access the values of the arrays.Here is a minimal example:
I always thought that dask loads data from the disk only when needed but here the respective file on the disk is removed.
So the question is, where does the data come from when I execute
ds_reopen.load()
? I have the feeling that this behaviour might be connected to theCachingFileManager
as @shoyer describes in #4240. Is xarray generating some sort of internal copy of the file when opening it?Beta Was this translation helpful? Give feedback.
All reactions