Intermediate Xarray+Dask tutorial #6945
Replies: 9 comments 10 replies
-
This sounds like a great idea and something that would be useful to the climate research community - thanks for the ping @ncclementi! Can you clarify what the proposed output will be? E.g. an organized workshop, a self-guided tutorial on the Dask (or xarray) website, a blog post, etc? |
Beta Was this translation helpful? Give feedback.
-
Certainly interested. However I am still a bit unclear what the end product will cover. But absolutely happy to keep chatting. |
Beta Was this translation helpful? Give feedback.
-
I thought about this a little and now think the question to answer is "What are intermediate-level concepts a dask.array user must understand?" The beginner-level concepts are:
The intermediate-level concepts are:
I think one important dask concept might come from the idea of "chunk management". This would cover things like
|
Beta Was this translation helpful? Give feedback.
-
Some interesting ideas/things to maybe cover from @djhoese 's AMS talk: https://ams.confex.com/ams/102ANNUAL/meetingapp.cgi/Paper/398825 Thanks for the share @maxrjones |
Beta Was this translation helpful? Give feedback.
-
What I hear a lot from at ESDS office hours is that Dask works great in theory with the curated datasets used in tutorials, but that people run into issues when applying their data - so perhaps we could bridge this gap with a workshop or use enrollee-submitted datasets? A live Dask debugging session? |
Beta Was this translation helpful? Give feedback.
-
I think all of these suggestions for 'how to approach problems' are phenomenal (even though hard to realize). To me personally this is exactly what makes users go from beginner to intermediate/advanced: the ability to figure out what is wrong. |
Beta Was this translation helpful? Give feedback.
-
If possible Project Pythia would love to turn this into a Cookbook in our gallery once we have some content in place. I'm keeping my eyes and ears open during NCAR office hours for debugging use cases (just had one where only 1 worker was being used). |
Beta Was this translation helpful? Give feedback.
-
I'm not sure actually. We have some of these but they don't really teach anything other than "well this computation is possible with that dataset, but not mine for some reason, perhaps I should dramatically rechunk my dataset". After thinking for a while, I think one solution is to demonstrate the same calculation with multiple chunking schemes. This will fail for some, work OK for some, and work spectacularly well for a few. Explaining why and point out clues to detect why this is happening would help teach some intuition about chunking. I now remember I tried this to some extent in flox but it does need improvement. @djhoese :
This is a great example! We can easily demonstrate good and bad performance with a fake file created with different internal chunks. The other gotcha here is threads vs processes when reading netcdf4/hdf5 fies.
Writing up the "debugging" is hard (here's one attempt ), I agree that a video could work well here. |
Beta Was this translation helpful? Give feedback.
-
It's possible, by using habits you've developed while working in An example arose this week in my lab. We wrote a function to QC a long time series of 3D spatial grids, followed by finding the maximum along the vertical dimension.
The example above, which is pretty well-vectorized compared to other approaches one could imagine, nevertheless had the side effect of loading everything into memory, and in the process stripping out all the Dask arrays. This code accomplishes the same thing, but runs instantly by building a deferred Dask computation. Note the need to avoid using Numpy in a way that is comfortingly familiar to many. That is a change in mindset that probably needs to be taught explicitly.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi folks, bringing here a conversation that started over in the dask slack channel.
@dcherian shared
cc'ing @jacobtomlinson that express interest in this. Quoting him
I think it would be great to collaborate on this, (xarray+dask) and maybe create a working group to tackle it. Ideas?
@TomNicholas @jbusecke @paigem would any of you be interested in this?
Beta Was this translation helpful? Give feedback.
All reactions