`xr.open_dataset` applies `FixedScaleOffset` filter before `_FillValue`

Stores written by VirtualiZarr handle scaling/offset via a `FixedScaleOffset` filter in dataset encoding, rather than with CF-convention-style attributes, for good reasons. Unfortunately it looks like this is applied *before* masking with `_FillValue`, when it should be applied afterward (as `_FillValue` should be the same type as the packed data).

It looks like there's been a ton of discussion on here around fill values, encoding, scaling, etc., but I haven't found anybody yet who's run into this exact issue. I'm torn on whether this is a Zarr issue or a VirtualiZarr issue but in the end I decided to post here, as I think it could only ever come up in this context.

As an MVE let's take some low-precision floats between 0-100 that we want to store as `uint8` with a `scale_factor` of 0.5 and a `_FillValue` of 255:

```python
import xarray as xr
import numpy as np

ds = xr.Dataset(
    {
        "x": ("t", np.array([0.55, np.nan, 99.45]))
    },
    coords={
        "t": [0,1,2],
    }
)
ds.to_netcdf("mve.nc", encoding={'x': {'dtype': 'uint8', 'scale_factor': 0.5, '_FillValue': 255}}, mode='w')
```

Then our packed data will have values about twice the actual data, potentially ranging from 0-200, with 255 replacing NaNs. `xr.decode_cf` handles this as we'd expect, returning our original data (with some loss of precision due to our own choices of course):

```python
file_ds = xr.open_dataset("mve.nc", mask_and_scale=False)
print(file_ds.x.encoding)
print(file_ds.x.attrs)
print(file_ds.x.values)
print(xr.decode_cf(file_ds).x.values)
```

    {'dtype': dtype('uint8'), 'zlib': False, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': True, 'chunksizes': None, 'source': '/home/charriso/org/work_projects/mve.nc', 'original_shape': (3,)}
    {'_FillValue': np.uint8(255), 'scale_factor': np.float64(0.5)}
    [  1 255 199]
    [ 0.5  nan 99.5]

Whereas after round-tripping through virtualizarr, `scale_factor` has been applied before masking the data, so we just get 255/2 in there instead of NaN (I checked that masking isn't just getting skipped - manually changing the `fill_value` in the json to `127.5` results in proper masking):

```python
from virtualizarr import open_virtual_dataset
manifest_ds =  open_virtual_dataset("mve.nc")
manifest_ds.virtualize.to_kerchunk('vds.json', format='json')

vds = xr.open_dataset(
    'vds.json',
    engine='kerchunk',
    mask_and_scale=False,
)
print(vds.x.encoding)
print(vds.x.attrs)
print(vds.x.values)
print(xr.decode_cf(vds).x.values)

```

    /home/charriso/micromamba/envs/fillval_mve/lib/python3.12/site-packages/zarr/core/metadata/v2.py:192: UserWarning: Found an empty list of filters in the array metadata document. This is contrary to the Zarr V2 specification, and will cause an error in the future. Use None (or Null in a JSON document) instead of an empty list of filters.
      warnings.warn(msg, UserWarning, stacklevel=1)
    {'chunks': (3,), 'preferred_chunks': {'t': 3}, 'compressors': (), 'filters': (FixedScaleOffset(scale=2.0, offset=0, dtype='<f8', astype='|u1'),), 'shards': None, 'dtype': dtype('float64')}
    {'_FillValue': np.float64(255.0)}
    [  0.5 127.5  99.5]
    [  0.5 127.5  99.5]

I notice this warning about an empty list of filters, not sure if that is at all relevant here.

One could work around this by manually editing `.zmetadata` to have the scaled `fill_value` rather than the original, but a better fix would be great! I'm happy to make a PR somewhere but given the extensive existing debate on Zarr/CF/xarray encoding I figured I'd make this post first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`xr.open_dataset` applies `FixedScaleOffset` filter before `_FillValue` #670

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

xr.open_dataset applies FixedScaleOffset filter before _FillValue #670

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`xr.open_dataset` applies `FixedScaleOffset` filter before `_FillValue` #670