Skip to content

apply to dataset #4863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
5655571
add a apply_to_dataset method
keewis Jan 30, 2021
90f8d55
write a test for apply_to_dataset on a DataArray
keewis Jan 30, 2021
fd2c897
also add a test for dataset
keewis Jan 30, 2021
857c783
convert apply_to_dataset to a top-level function
keewis Feb 5, 2021
57e94b6
update whats-new.rst
keewis Feb 5, 2021
cdb0f3d
add the new function to api.rst [skip-ci]
keewis Feb 5, 2021
1d81a49
rephrase the note [skip-ci]
keewis Feb 5, 2021
88fe863
add a see also section [skip-ci]
keewis Feb 5, 2021
0daf42d
add examples [skip-ci]
keewis Feb 5, 2021
ef3f791
Merge branch 'master' into apply-to-dataset
keewis Feb 7, 2021
638d61c
Merge branch 'master' into apply-to-dataset
keewis Feb 11, 2021
559d8ef
rename to call_on_dataset
keewis Mar 15, 2021
0c424bf
preserve the name as much as possible
keewis Mar 15, 2021
8db9e7e
update api.rst
keewis Mar 15, 2021
c902dfe
Merge branch 'master' into apply-to-dataset
keewis Mar 15, 2021
43bf70d
update whats-new.rst
keewis Mar 15, 2021
31645e5
remove the notes
keewis Mar 15, 2021
293d9c1
remove the no-op
keewis Mar 15, 2021
d0de1ca
don't rename to None
keewis Mar 15, 2021
a822232
rename to "<this-array>"
keewis Mar 15, 2021
d278919
rewrite [skip-ci]
keewis Mar 15, 2021
0669da9
Merge branch 'master' into apply-to-dataset
keewis Mar 15, 2021
97d4338
rename back to None
keewis Mar 15, 2021
48109db
Merge branch 'master' into apply-to-dataset
keewis Mar 28, 2021
b15d45e
Merge branch 'master' into apply-to-dataset
keewis Apr 5, 2021
371f509
introduce a mandatory name parameter to use as a name for the data va…
keewis May 10, 2021
8f37872
Merge branch 'master' into apply-to-dataset
keewis May 10, 2021
c9459f7
move to the new section in whats-new.rst
keewis May 10, 2021
021ad36
fix the tests
keewis May 11, 2021
dcb747b
Merge branch 'master' into apply-to-dataset
keewis May 31, 2021
7081e15
update the input and expected values
keewis May 31, 2021
52a39f3
add the missing name for the dataset call
keewis May 31, 2021
fcfaaa5
use DataArray.to_dataset instead
keewis May 31, 2021
f2d2880
only convert if the result is a Dataset
keewis May 31, 2021
b59dd1e
Merge branch 'master' into apply-to-dataset
dcherian Jun 21, 2021
12400cb
Merge branch 'main' into apply-to-dataset
keewis Jul 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,15 @@
from .core.alignment import align, broadcast
from .core.combine import combine_by_coords, combine_nested
from .core.common import ALL_DIMS, full_like, ones_like, zeros_like
from .core.computation import apply_ufunc, corr, cov, dot, polyval, where
from .core.computation import (
apply_to_dataset,
apply_ufunc,
corr,
cov,
dot,
polyval,
where,
)
from .core.concat import concat
from .core.dataarray import DataArray
from .core.dataset import Dataset
Expand Down Expand Up @@ -46,6 +54,7 @@
# Top-level functions
"align",
"apply_ufunc",
"apply_to_dataset",
"as_variable",
"broadcast",
"cftime_range",
Expand Down
36 changes: 36 additions & 0 deletions xarray/core/computation.py
Original file line number Diff line number Diff line change
Expand Up @@ -1142,6 +1142,42 @@ def earth_mover_distance(first_samples,
return apply_array_ufunc(func, *args, dask=dask)


def apply_to_dataset(func, obj, *args, **kwargs):
"""apply a function expecting a Dataset to a xarray object

Parameters
----------
func : callable
A function expecting a Dataset as its first parameter.
obj : DataArray or Dataset
The dataset to apply ``func`` to. If a ``DataArray``, convert it to a single
variable ``Dataset`` first.
*args, **kwargs
Additional arguments to ``func``

Returns
-------
DataArray or Dataset
The result of ``func(obj, *args, **kwargs)`` with the same type as ``obj``.

Notes
-----
If a ``DataArray``, result will have the same name as ``obj`` but the single data
variable in the temporary ``Dataset`` will always have a generic name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this as simple as "DataArrays will retain their name"? If so, maybe we don't need any notes? (very possible I'm missing some of the complexity, as ever)

Copy link
Collaborator Author

@keewis keewis Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I might not have explained that correctly. The temporary Dataset generates always has a <this-array> variable, but the original name will be restored by _from_temp_dataset.

Edit: I rewrote it, is that easier to understand?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the name of the variable of the temporary dataset matter to the user though? To what extent is that just an implementation detail?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function will see the Dataset so it might be important to keep the note. For example, this would need to change the name in the units dict from None to <this-array> to work correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in core/parallel.py there's a dataarray_to_dataset (and inverse dataset_to_dataarray) function that preserves name if possible. I think name preservation is a good thing for a user-facing function.

"""
from .dataarray import DataArray

ds = obj._to_temp_dataset() if isinstance(obj, DataArray) else obj

result = func(ds, *args, **kwargs)

return (
obj._from_temp_dataset(result, name=obj.name)
if isinstance(obj, DataArray)
else result
)


def cov(da_a, da_b, dim=None, ddof=1):
"""
Compute covariance between two DataArray objects along a shared dimension.
Expand Down
43 changes: 43 additions & 0 deletions xarray/tests/test_computation.py
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,49 @@ def test_apply_groupby_add():
add(data_array.groupby("y"), data_array.groupby("x"))


@pytest.mark.parametrize(
["obj", "expected"],
(
pytest.param(
xr.DataArray(
[0, 1],
coords={
"x": ("x", [-1, 1], {"a": 1, "b": 2}),
"u": ("x", [2, 3], {"c": 3}),
},
dims="x",
attrs={"d": 4, "e": 5},
),
xr.DataArray([0, 1], coords={"x": [-1, 1], "u": ("x", [2, 3])}, dims="x"),
id="DataArray",
),
pytest.param(
xr.Dataset(
{"a": ("x", [1, 2], {"a": 1, "b": 2}), "b": ("x", [0, 1], {"c": 3})},
coords={
"x": ("x", [-1, 1], {"d": 4, "e": 5}),
"u": ("x", [2, 3], {"f": 6}),
},
),
xr.Dataset(
{"a": ("x", [1, 2]), "b": ("x", [0, 1])},
coords={"x": [-1, 1], "u": ("x", [2, 3])},
),
id="Dataset",
),
),
)
def test_apply_to_dataset(obj, expected):
def clear_all_attrs(ds):
new_ds = ds.copy()
for var in new_ds.variables.values():
var.attrs.clear()
new_ds.attrs.clear()
return new_ds

assert_identical(expected, xr.apply_to_dataset(clear_all_attrs, obj))


def test_unified_dim_sizes():
assert unified_dim_sizes([xr.Variable((), 0)]) == {}
assert unified_dim_sizes([xr.Variable("x", [1]), xr.Variable("x", [1])]) == {"x": 1}
Expand Down