Questions with map_blocks and apply_ufunc #6370
-
Hi everyone, I'm testing ways to process a ragged array with xarray and I have difficulties understanding the I have a dataset that is loaded with chunks of different sizes, as an example:
I can map a function to each block using map_blocks() as followed:
This works fine, two side questions:
Alright, so if the output of the function is the same length as the input, e.g.:
I can use As I understand,
Anyone could help me with this and maybe point to some examples? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
For Your example with For the solution, this operation is too complicated for def per_block_mean(array):
assert array.ndim == 1
nchunks = len(array.chunks[0])
# function returns one value per chunk, so output chunks are easy to construct
output_chunks = ([1] * nchunks,)
# mapped function MUST return an array
# https://github.com/dask/dask/issues/8822
return array.map_blocks(lambda x: np.mean(x, keepdims=True), chunks=output_chunks) Now that we have a function, we use
xr.apply_ufunc(
per_block_mean,
dt,
input_core_dims=[["x"]],
output_core_dims=[["x"]],
exclude_dims=set("x"),
dask="allowed",
) |
Beta Was this translation helpful? Give feedback.
.data
pulls out the underlying dask array;.data.map_blocks
callsdask.array.map_blocks
.For
xarray.map_blocks
must return either a Dataset or a DataArray, Fordask.array.map_blocks
must return a numpy array.Your example with
dask="parallelized"
cannot be right. It raisesValueError: axes don't match array
on compute and should never have worked AFAICT (#6372)For the solution, this operation is too complicated for
dask="parallelized"
so we usedask="allowed"
. This means your function must know how to handle dask arrays.1. The function itself loo…