-
Hello there, I'm processing image data that is chunked over its height and width. The end result however has to be a large set of (flattened) pixels. I've been using stack to combine height and width into pixels. I've been encountering performance problems with this. I suspect these occur because stack stacks dimensions over their entire lengths, which makes expensive rechunking necessary. This problem gets worse as my images get larger. I've tried to give a code sample that shows my problem: import numpy as np
import xarray as xr
image = xr.DataArray(
data=np.random.rand(128, 256, 3),
dims=["y", "x", "band"],
coords=dict(
chunk_idx=( # To keep track of the chunks after stacking.
["y", "x"],
np.concatenate([np.zeros((128, 128)), np.ones((128, 128))], axis=1),
),
),
)
image = image.chunk(chunks={"y": 128, "x": 128, "band": -1})
# As expected, the first chunk is the first 128 by 128 block.
print(
np.unique(
image.isel(y=slice(0, 128), x=slice(0, 128)).chunk_idx.compute(),
return_counts=True,
)
)
# Prints: (array([0.]), array([16384]))
stacked = image.stack(pixel=("y", "x"))
# After stacking, the first chunk consists of the half of both original 128 by 128 chunks (blocks).
print(
np.unique(
stacked.isel(pixel=slice(0, 128 * 128)).chunk_idx.compute(), return_counts=True
)
)
# Prints: (array([0., 1.]), array([8192, 8192])) I would like to apply stack per chunk instead of over entire dimension lengths. What is the best way to achieve this? I've looked into map_blocks, but I see I'm not allowed to add new chunked dimensions. I believe this Dask issue describes roughly what I'm looking for, but ideally I would like to make it work with Xarray. Thanks to anybody in advance and hopefully I've described my question clearly. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yes, very clearly! The |
Beta Was this translation helpful? Give feedback.
Yes, very clearly!
The
reshape_block
function in this comment does it for dask: #5629 (comment) . It is written as a blockwisestack
, but has some assumptions built-in I think.