Skip to content

combine_first() fails when DataArrays are read from netcdf #8931

Open
@mattaubury

Description

@mattaubury

What happened?

I'm trying to run combine_first on two DataArrays (on which a sel has been performed). This works fine if I manipulated the DataArrays directly in memory, but if I read them from disk I get the following error:

$ python bug.py
Traceback (most recent call last):
  File "/physical/gpfs/carp2-home/car_home02/data_files/jcs/maubury/jca/juggernaut/bug.py", line 27, in <module>
    a.combine_first(b)
  File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/dataset.py", line 6737, in combine_first
    out = ops.fillna(self, other, join="outer", dataset_join="outer")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/ops.py", line 148, in fillna
    return apply_ufunc(
           ^^^^^^^^^^^^
  File "site-packages/xarray/core/computation.py", line 1257, in apply_ufunc
    return apply_dataset_vfunc(
           ^^^^^^^^^^^^^^^^^^^^
  File "site-packages/xarray/core/computation.py", line 550, in apply_dataset_vfunc
    out = _fast_dataset(result_vars, coord_vars, indexes=indexes)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "site-packages/xarray/core/computation.py", line 488, in _fast_dataset
    return Dataset._construct_direct(variables, coord_names, indexes=indexes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "site-packages/xarray/core/dataset.py", line 1068, in _construct_direct
    dims = calculate_dimensions(variables)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "site-packages/xarray/core/variable.py", line 2984, in calculate_dimensions
    raise ValueError(
ValueError: conflicting sizes for dimension 'dim1': length 2 on 'dim1' and length 1 on {'dim0': 'q', 'dim1': 'q'}

What did you expect to happen?

The combine_first should succeed. Removing the internal branch which round-trips the arrays to netcdf demonstrates it working.

Minimal Complete Verifiable Example

import xarray

aa = xarray.DataArray(
    [[0, 1]],
    dims=("dim0", "dim1"),
    coords={"dim0": ["a"], "dim1": ["x", "y"]},
    name="q",
)

bb = xarray.DataArray(
    [[2, 3]],
    dims=("dim0", "dim1"),
    coords={"dim0": ["b"], "dim1": ["x", "y"]},
    name="q",
)

if True:  # Works if this is False
    aa.to_netcdf("aa.nc")
    bb.to_netcdf("bb.nc")

    aa = xarray.open_dataset("aa.nc")
    bb = xarray.open_dataset("bb.nc")

a = aa.sel({"dim0": [False]}).load()
b = bb.sel({"dim0": [False]}).load()

a.combine_first(b)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

$ python bug.py
Traceback (most recent call last):
  File "/physical/gpfs/carp2-home/car_home02/data_files/jcs/maubury/jca/juggernaut/bug.py", line 27, in <module>
    a.combine_first(b)
  File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/dataset.py", line 6737, in combine_first
    out = ops.fillna(self, other, join="outer", dataset_join="outer")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/ops.py", line 148, in fillna
    return apply_ufunc(
           ^^^^^^^^^^^^
  File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/computation.py", line 1257, in apply_ufunc
    return apply_dataset_vfunc(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/computation.py", line 550, in apply_dataset_vfunc
    out = _fast_dataset(result_vars, coord_vars, indexes=indexes)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/computation.py", line 488, in _fast_dataset
    return Dataset._construct_direct(variables, coord_names, indexes=indexes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/dataset.py", line 1068, in _construct_direct
    dims = calculate_dimensions(variables)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/variable.py", line 2984, in calculate_dimensions
    raise ValueError(
ValueError: conflicting sizes for dimension 'dim1': length 2 on 'dim1' and length 1 on {'dim0': 'q', 'dim1': 'q'}

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-372.32.1.el8_6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development

xarray: 2024.3.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.1.1
distributed: 2024.1.1
matplotlib: 3.8.3
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.2.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.0.3
pip: 24.0
conda: None
pytest: 8.0.0
mypy: 1.9.0
IPython: 8.21.0
sphinx: 7.2.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions