Applying GroupBy.map along chunks instead of dimensions #8076
Unanswered
sadsimulation
asked this question in
Q&A
Replies: 1 comment
-
If Can you create a small example to show how your groups are patterned? It seems like each group is sequential? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm processing larger than memory datasets using dask-backed xarrays.
However I often need to perform indexing using
dataset1.where(array2, drop=True)
.The intermediate computations to get
dataset1
andarray2
are fairly computationally intensive and make use of thexarray
indexing and dimension name features, so I feel likexr.apply_ufunc
withdask='allowed'
wouldn't be a good fit here. Unfortunately the masking causes the output sizes of the outputxr.Dataset
dimensions to change based on the data (peak detection), soxr.map_blocks
is difficult to apply because I don't know the output template shape.A workaround that I have used to get things working at all is to use
dataset.groupby('mydim').map(myfunc)
on adataset
that is not backed by dask arrays. This is not great because the groups alongmydim
vary in size and don't fit a simple uniform chunking along that dimension and predictably only utilizes a single core as the groups are processed sequentially.Is there an easy way to do something like
dataset.chunk(mydim=100).groupby('mydim').map(myfunc)
that would utilize my machines CPU better?Beta Was this translation helpful? Give feedback.
All reactions