Skip to content

Basic rechunking example #539

Closed
Closed
@norlandrhagen

Description

@norlandrhagen

Working my way through understanding cubed / cubed-xarray.

I'm trying to get an example working of modifying the chunking of an Xarray dataset and writing it to Zarr. When I roundtrip the Zarr to and from Xarray, it seems like the chunking structure hasn't changed. Is using the .chunk method on an Xarray dataset with cubed viable or should I be using rechunk primitive?

Roundtrip example using Xarray + dask chunks

import xarray as xr 
from zarr.storage import TempStore

ts = TempStore('air_temp_dask.zarr')

ds = xr.tutorial.open_dataset('air_temperature', chunks={})
rds = ds.chunk({'time':1})
rds.to_zarr(ts, consolidated=True)

rtds = xr.open_zarr(ts, chunks={})
rtds

assert rtds.chunks == rds.chunks

Roundtrip example using Xarray + cubed

from cubed import Spec
import xarray as xr 
from zarr.storage import TempStore

ts = TempStore('air_temp_cubed.zarr')

spec = Spec(work_dir='tmp', allowed_mem='2GB')
ds = xr.tutorial.open_dataset('air_temperature', chunked_array_type='cubed',
     from_array_kwargs={'spec': spec},chunks={})

rds = ds.chunk({'time':1}, chunked_array_type="cubed")

# does compute need to be called?
# rds.compute()

rds.to_zarr(ts, consolidated=True, chunkmanager_store_kwargs={'from_array_kwargs': {'spec': spec} })

rtds = xr.open_zarr(ts, chunked_array_type='cubed',
     from_array_kwargs={'spec': spec},chunks={})
     
# This fails
assert rtds.chunks == rds.chunks

chunked dataset (rds):
image

roundtripped dataset (rtds):
image

🤞 this is an end-of-day brain implementation issue on my end.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingxarray-integrationUses or required for cubed-xarray integration

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions