Skip to content

Iteration over Zarr arrays is significantly slower in v3 #2529

Open
@tomwhite

Description

@tomwhite

Zarr version

3.0.0b2

Numcodecs version

0.14.0

Python Version

3.11

Operating System

Mac

Installation

pip

Description

Iterating over elements or slices of a Zarr array z using iter(z) is much slower in v3 than v2.

In v2 Array implements __iter__, which caches chunks, whereas in v3 Array does not implement __iter__ so iter(z) falls back to calling z[0], z[1], etc, which means that every element or slice has to load the chunk again.

This seems like a regression, but perhaps there was a reason for not implementing __iter__ on Array? The thing is that code will still work in v3, but will be a lot slower (possibly orders of magnitude), and it's quite hard to track down what is happening.

(There's an interesting Array API issue about iterating over arrays which may be relevant here: data-apis/array-api#818.)

Steps to reproduce

See sgkit-dev/bio2zarr#288 (comment)

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePotential issues with Zarr performance (I/O, memory, etc.)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions