Description
What happened?
I'm trying to run combine_first
on two DataArrays (on which a sel
has been performed). This works fine if I manipulated the DataArrays directly in memory, but if I read them from disk I get the following error:
$ python bug.py
Traceback (most recent call last):
File "/physical/gpfs/carp2-home/car_home02/data_files/jcs/maubury/jca/juggernaut/bug.py", line 27, in <module>
a.combine_first(b)
File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/dataset.py", line 6737, in combine_first
out = ops.fillna(self, other, join="outer", dataset_join="outer")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/ops.py", line 148, in fillna
return apply_ufunc(
^^^^^^^^^^^^
File "site-packages/xarray/core/computation.py", line 1257, in apply_ufunc
return apply_dataset_vfunc(
^^^^^^^^^^^^^^^^^^^^
File "site-packages/xarray/core/computation.py", line 550, in apply_dataset_vfunc
out = _fast_dataset(result_vars, coord_vars, indexes=indexes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "site-packages/xarray/core/computation.py", line 488, in _fast_dataset
return Dataset._construct_direct(variables, coord_names, indexes=indexes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "site-packages/xarray/core/dataset.py", line 1068, in _construct_direct
dims = calculate_dimensions(variables)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "site-packages/xarray/core/variable.py", line 2984, in calculate_dimensions
raise ValueError(
ValueError: conflicting sizes for dimension 'dim1': length 2 on 'dim1' and length 1 on {'dim0': 'q', 'dim1': 'q'}
What did you expect to happen?
The combine_first
should succeed. Removing the internal branch which round-trips the arrays to netcdf demonstrates it working.
Minimal Complete Verifiable Example
import xarray
aa = xarray.DataArray(
[[0, 1]],
dims=("dim0", "dim1"),
coords={"dim0": ["a"], "dim1": ["x", "y"]},
name="q",
)
bb = xarray.DataArray(
[[2, 3]],
dims=("dim0", "dim1"),
coords={"dim0": ["b"], "dim1": ["x", "y"]},
name="q",
)
if True: # Works if this is False
aa.to_netcdf("aa.nc")
bb.to_netcdf("bb.nc")
aa = xarray.open_dataset("aa.nc")
bb = xarray.open_dataset("bb.nc")
a = aa.sel({"dim0": [False]}).load()
b = bb.sel({"dim0": [False]}).load()
a.combine_first(b)
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
$ python bug.py
Traceback (most recent call last):
File "/physical/gpfs/carp2-home/car_home02/data_files/jcs/maubury/jca/juggernaut/bug.py", line 27, in <module>
a.combine_first(b)
File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/dataset.py", line 6737, in combine_first
out = ops.fillna(self, other, join="outer", dataset_join="outer")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/ops.py", line 148, in fillna
return apply_ufunc(
^^^^^^^^^^^^
File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/computation.py", line 1257, in apply_ufunc
return apply_dataset_vfunc(
^^^^^^^^^^^^^^^^^^^^
File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/computation.py", line 550, in apply_dataset_vfunc
out = _fast_dataset(result_vars, coord_vars, indexes=indexes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/computation.py", line 488, in _fast_dataset
return Dataset._construct_direct(variables, coord_names, indexes=indexes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/dataset.py", line 1068, in _construct_direct
dims = calculate_dimensions(variables)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maubury/.conda/envs/juggernaut/lib/python3.11/site-packages/xarray/core/variable.py", line 2984, in calculate_dimensions
raise ValueError(
ValueError: conflicting sizes for dimension 'dim1': length 2 on 'dim1' and length 1 on {'dim0': 'q', 'dim1': 'q'}
Anything else we need to know?
No response
Environment
xarray: 2024.3.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.1.1
distributed: 2024.1.1
matplotlib: 3.8.3
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.2.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.0.3
pip: 24.0
conda: None
pytest: 8.0.0
mypy: 1.9.0
IPython: 8.21.0
sphinx: 7.2.6