Support for Zarr object dtype (`"|O"`) datasets

First, thank you for making MATLAB/Zarr integration a priority—this work will be highly valueable as more and more data moves to the cloud.

I’m part of the development team behind **MatNWB** (<https://github.com/NeurodataWithoutBorders/matnwb>), a MATLAB  package for reading and writing files of the Neurodata Without Borders (NWB) format. We’re interested in implementing support for Zarr as an alternative backend to HDF5 for NWB-files, and the *MATLAB-support-for-Zarr-files* looks like a very promising starting point.

While testing NWB-Zarr files exported with **PyNWB**, we ran into read failures whenever a dataset has dtype `|O` (Python object). Below is a minimal reproduction:

    % MATLAB R2024b + commit 3a7b0a3 of this repo

```matlab
data = zarrread(zarrFilePath);
```

#### Observed error

```text
Error using zarrread (line 15)
Python Error: ValueError: FAILED_PRECONDITION: Error opening "zarr" driver: Error reading local file
"~/zarr_matlab/test_data/test_zarr_sub_anm00239123_ses_20170627T093549_ecephys_and_ogen.nwb.zarr/file_create_date/.zarray":
Error parsing object member "dtype": Unsupported zarr dtype: "|O" [source
locations='tensorstore/driver/zarr/dtype.cc:225\ntensorstore/driver/zarr/dtype.cc:324\ntensorstore/driver/zarr/dtype.cc:356\ntensorstore/internal/json_binding/json_binding.h:865\ntensorstore/internal/json_binding/json_binding.h:830\ntensorstore/internal/json_binding/json_binding.h:388\ntensorstore/driver/zarr/driver.cc:108\ntensorstore/driver/kvs_backed_chunk_driver.cc:1162\ntensorstore/internal/cache/kvs_backed_cache.h:208\ntensorstore/driver/driver.cc:112']
[tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{},\"file_io_locking\":{},\"file_io_memmap\":false,\"file_io_sync\":true},\"driver\":\"zarr\",\"kvstore\":{\"driver\":\"file\",\"path\":\"/Users/Eivind/Code/MATLAB/Sandbox/CN/zarr_matlab/test_data/test_zarr_sub_anm00239123_ses_20170627T093549_ecephys_and_ogen.nwb.zarr/file_create_date/\"}}']
```

The dataset in question is attached below.

#### Expected behavior
For NWB, object dtypes typically contain variable-length UTF-8 strings or JSON-encoded metadata blobs. Ideally, they’d be returned as MATLAB cell arrays of char/string.

### Investigation so far

* PyNWB relies on *zarr-python* (v2.18) which stores object arrays as VLEN metadata + bytes — tensorstore appears to not support reading this data type (https://github.com/google/tensorstore/issues/103#issue-1742502761).

### Questions

1. Are you already tracking support for object dtypes in tensorstore or your MATLAB layer?
2. Would you be interested in working to support this and/or accept PRs with read/write support for object types.

### Preliminary workaround

```matlab
    zInfo = zarrinfo(zarrFilePath);
    if strcmp(zInfo.dtype, '|O')
        data = read_zarr_object(zarrFilePath);
    else
        data = zarrread(zarrFilePath);
    end
```
read_zarr_object.m
```matlab
function result = read_zarr_object(zarrPath)
    
    z = py.zarr.open_array(zarrPath, pyargs('mode', 'r'));

    % Create a slice object: slice(None) means ':'
    pySlice = py.slice(py.None);
    
    % Read the array with explicit slicing
    sliceFcn = py.getattr(z, '__getitem__');
    rawData = sliceFcn(pySlice);

    matCell = cell(rawData.tolist());
    pyElem = matCell{1};  % There's only one element

    if isa(pyElem, 'py.bytes')
        result = char(pyElem.decode('utf-8'));
    elseif isa(pyElem, 'py.str')
        result = char(pyElem);
    elseif isa(pyElem, 'py.hdmf_zarr.utils.ZarrReference')
        % Decode as json
        result = char(pyElem);
        result = strrep(result, '''', '"');
        result = jsondecode(result);
    else
        error('Unhandled type: %s', class(pyElem));
    end    
end
```

### Reproduction materials

* **Test dataset**: [file_create_date.zip](https://github.com/user-attachments/files/21111012/file_create_date.zip)
* **zarr metadata snippet**:
```json
    {
        "dtype": "|O",
        "fill_value": 0,
        "filters": [
            {
                "id": "vlen-bytes"
            }
        ]
    }
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Zarr object dtype (`"|O"`) datasets #112

Observed error

Expected behavior

Investigation so far

Questions

Preliminary workaround

Reproduction materials

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Zarr object dtype ("|O") datasets #112

Description

Observed error

Expected behavior

Investigation so far

Questions

Preliminary workaround

Reproduction materials

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Support for Zarr object dtype (`"|O"`) datasets #112