Skip to content

to_zarr with mode='a' fails when _FillValue is present, caused by open_zarr with mask_and_scale=False. #9053

Open
@cpegel

Description

@cpegel

What happened?

When opening a Zarr dataset with open_zarr and then writing to it using to_zarr, we get a ValueError when the _FillValue attribute of at least one data variable or coordinate is present. This happens for example when opening the dataset with mask_and_scale=False.

ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

The error can be worked around by deleting the _FillValue attributes of all data variables and coordinates before calling to_zarr. However, if the Zarr metadata contains meaningful fill_value attributes beforehand, they will then be lost after a round-trip of open_zarr (with mask_and_scale=False) and to_zarr.

The same behavior, but without the cause being mask_and_scale=False, has been reported in the open issues #6069 and #6329 without any solutions.

What did you expect to happen?

I expect to be able to read from and then write to a Zarr storage without having to delete attributes in between or lose available fill_value metadata from the Zarr storage. Calling to_zarr with mode='a' should just write any DataArrays _FillValue attribute to the fill_value field in the Zarr metadata instead of failing with a ValueError.

Minimal Complete Verifiable Example

import xarray as xr

# Create a dataset and write to Zarr storage

ds = xr.Dataset(dict(A=xr.DataArray([1.0])))
# ds.A.attrs is empty here.

zarr_path = "/path/to/storage.zarr"
ds.to_zarr(zarr_path, mode='a')

# Read the dataset from Zarr again using `mask_and_scale=False`

ds = xr.open_zarr(zarr_path, mask_and_scale=False)
# ds.A.attrs is now {'_FillValue': nan}

# Write the dataset to Zarr again
ds.to_zarr(zarr_path, mode='a')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 07:53:56) [MSC v.1937 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('de_DE', 'cp1252')
libhdf5: None
libnetcdf: None

xarray: 2024.3.0
pandas: 2.2.0
numpy: 1.26.4
scipy: 1.12.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.18.0
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.4.1
distributed: 2024.4.1
matplotlib: 3.8.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.0.3
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.21.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions