Icechunks to DataTree workflow speed up? #1384

rsignell · 2025-11-07T14:59:53Z

rsignell
Nov 7, 2025

I'm working with CMIP6 CORDEX data which has a bunch of different domains (AR-44, EUR-11, etc), the usual experiments ( historical, rcp85, etc) and a bunch of different models (CNRM-CERFACS-CNRM-CM5, ICHEC-EC-EARTH, etc).

I've created virtual icechunk repos for each of these which reference all the netcdf files over time for all variables.

Then I'm opening xarray datasets from the icechunk repos and creating a datatree.

The construction of the datatree is taking a while (1.5 min on my machine) and I was thinking to speed it up using Dask, but running into pickling problems. That got me wondering whether I should have just used a single icechunk repo with groups for the CORDEX data.

As you call tell, I'm a bit confused. Maybe because it's Friday afternoon. 🤷

Any suggestions for how to construct this datatree more efficiently?

TomNicholas · 2025-11-07T15:20:14Z

TomNicholas
Nov 7, 2025
Maintainer

Is it the cell which constructs the DataTree object that is taking a while, or the cell which creates the ds_dict?

3 replies

rsignell Nov 7, 2025
Author

The ds_dict! Each icechunk takes 10-15s to open

TomNicholas Nov 7, 2025
Maintainer

There are at least two problems:

Why does it take 10-15s to open 1 icechunk. It should be like <1s. Could this be the same tiny manifest problem as in our previous conversation
Serial loop over icechunk stores. Having multiple icechunk stores is arguably an anti-pattern here, and your sequential loop in python is adding all the latencies together.

rsignell Nov 7, 2025
Author

Yes this the same data with tiny chunks, and yes I was hoping to speed this up by opening the icechunks in parallel but maybe that doesn't help because they are already being opened in parallel?

By anti pattern do you mean I shouldn't be using icechunk for these data or do you mean I should just have one icechunk with groups or something else?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Icechunks to DataTree workflow speed up? #1384

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Icechunks to DataTree workflow speed up? #1384

Uh oh!

rsignell Nov 7, 2025

Replies: 1 comment · 3 replies

Uh oh!

TomNicholas Nov 7, 2025 Maintainer

Uh oh!

rsignell Nov 7, 2025 Author

Uh oh!

TomNicholas Nov 7, 2025 Maintainer

Uh oh!

rsignell Nov 7, 2025 Author

rsignell
Nov 7, 2025

Replies: 1 comment 3 replies

TomNicholas
Nov 7, 2025
Maintainer

rsignell Nov 7, 2025
Author

TomNicholas Nov 7, 2025
Maintainer

rsignell Nov 7, 2025
Author