Retrieving data to local storage without loading it into memory #7371
Replies: 4 comments 3 replies
-
pinging @martindurant @andersy005 for ideas |
Beta Was this translation helpful? Give feedback.
-
What is the problem with
? |
Beta Was this translation helpful? Give feedback.
-
I will think about it. I don't think there's a way to not access the data and use dask worker threads if you are working via the xarray interface. You could, however, have larger dask partitions, so that at least each task is waiting on more chunks at once, but it will take transiently up memory whatever you then do with that partition (e.g., your mean call). Using the the filesystem interface directly you have some more options. You could choose to
Should work, or, better, break the file list into batches. Also, I notice that the data of the one file I checked is uncompressed so this might be of interest. |
Beta Was this translation helpful? Give feedback.
-
Thanks for sharing your thoughts @martindurant. I appreciate it. The Is there a way to see which byte-ranges and files each Dask-chunk belongs to from within xarray? This might allow me to write an accessor to achieve the retrieval of data chunks without loading them into memory. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there 👋,
I was wondering if there is a way to cache data locally without loading it directly into memory as methods like
load
,persist
orcompute
do.Why I like to know this?
There are two cases I can think about:
Workflow to achieve similar goal, but not quite what I want
This example has the issue that
What I like to have
Curious to hear if there are already some solution/hacks or in which direction I should look into.
Beta Was this translation helpful? Give feedback.
All reactions