Description
What happened?
When opening a Zarr dataset with open_zarr
and then writing to it using to_zarr
, we get a ValueError
when the _FillValue
attribute of at least one data variable or coordinate is present. This happens for example when opening the dataset with mask_and_scale=False
.
ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.
The error can be worked around by deleting the _FillValue
attributes of all data variables and coordinates before calling to_zarr
. However, if the Zarr metadata contains meaningful fill_value
attributes beforehand, they will then be lost after a round-trip of open_zarr
(with mask_and_scale=False
) and to_zarr
.
The same behavior, but without the cause being mask_and_scale=False
, has been reported in the open issues #6069 and #6329 without any solutions.
What did you expect to happen?
I expect to be able to read from and then write to a Zarr storage without having to delete attributes in between or lose available fill_value
metadata from the Zarr storage. Calling to_zarr
with mode='a'
should just write any DataArrays _FillValue
attribute to the fill_value
field in the Zarr metadata instead of failing with a ValueError.
Minimal Complete Verifiable Example
import xarray as xr
# Create a dataset and write to Zarr storage
ds = xr.Dataset(dict(A=xr.DataArray([1.0])))
# ds.A.attrs is empty here.
zarr_path = "/path/to/storage.zarr"
ds.to_zarr(zarr_path, mode='a')
# Read the dataset from Zarr again using `mask_and_scale=False`
ds = xr.open_zarr(zarr_path, mask_and_scale=False)
# ds.A.attrs is now {'_FillValue': nan}
# Write the dataset to Zarr again
ds.to_zarr(zarr_path, mode='a')
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 07:53:56) [MSC v.1937 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('de_DE', 'cp1252')
libhdf5: None
libnetcdf: None
xarray: 2024.3.0
pandas: 2.2.0
numpy: 1.26.4
scipy: 1.12.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.18.0
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.4.1
distributed: 2024.4.1
matplotlib: 3.8.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.0.3
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.21.0
sphinx: None