Skip to content

Document that Coarsen accepts coord func as callable #7981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions doc/user-guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -460,9 +460,9 @@ and ``mean``, ``std`` and ``var`` return ``NaN``:
Coarsen large arrays
====================

:py:class:`DataArray` and :py:class:`Dataset` objects include a
:py:class:`DataArray` and :py:class:`Dataset` objects include
:py:meth:`~xarray.DataArray.coarsen` and :py:meth:`~xarray.Dataset.coarsen`
methods. This supports block aggregation along multiple dimensions,
method. This supports block aggregation along multiple dimensions.

.. ipython:: python

Expand All @@ -475,8 +475,8 @@ methods. This supports block aggregation along multiple dimensions,
)
da

In order to take a block mean for every 7 days along ``time`` dimension and
every 2 points along ``x`` dimension,
In order to take a block mean for every 7 days along the ``time`` dimension and
every 2 points along the ``x`` dimension,

.. ipython:: python

Expand All @@ -491,13 +491,20 @@ the excess entries or padding ``nan`` to insufficient entries,

da.coarsen(time=30, x=2, boundary="trim").mean()

If you want to apply a specific function to coordinate, you can pass the
function or method name to ``coord_func`` option,
By default the coordinates will be replaced with the mean of the coordinate values in block.
If instead you want to apply a specific reduction function to the coordinate values, you can pass the
function or method name as a string via the ``coord_func`` keyword argument,

.. ipython:: python

da.coarsen(time=7, x=2, coord_func={"time": "min"}).mean()

Or you can pass any valid reduction function as a callable

.. ipython:: python

da.coarsen(time=7, x=2, coord_func={"time": np.ptp}).count()

You can also :ref:`use coarsen to reshape<reshape.coarsen>` without applying a computation.

.. _compute.using_coordinates:
Expand Down
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ Documentation
- Added examples to docstrings of :py:meth:`Dataset.isel`, :py:meth:`Dataset.reduce`, :py:meth:`Dataset.argmin`,
:py:meth:`Dataset.argmax` (:issue:`6793`, :pull:`7881`)
By `Harshitha <https://github.com/harshitha1201>`_ .
- Documents that :py:meth:`DataArray.coarsen` accepts a callable as the reduction function.
(:pull:`7981`) By `Tom Nicholas <https://github.com/TomNicholas>`_.


Internal Changes
Expand Down
17 changes: 12 additions & 5 deletions xarray/core/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -786,12 +786,13 @@ def construct(


class Coarsen(CoarsenArithmetic, Generic[T_Xarray]):
"""A object that implements the coarsen.
"""An object that implements the coarsen operation.

See Also
--------
Dataset.coarsen
DataArray.coarsen
Variable.coarsen
"""

__slots__ = (
Expand All @@ -814,21 +815,27 @@ def __init__(
coord_func: str | Callable | Mapping[Any, str | Callable],
) -> None:
"""
Moving window object.
Coarsening object.

Parameters
----------
obj : Dataset or DataArray
Object to window.
windows : mapping of hashable to int
A mapping from the name of the dimension to create the rolling
exponential window along (e.g. `time`) to the size of the moving window.
A mapping from the name of the dimension to create the coarsened block along (e.g. `time`) to the size of
the coarsened block.
boundary : {"exact", "trim", "pad"}
If 'exact', a ValueError will be raised if dimension size is not a
multiple of window size. If 'trim', the excess indexes are trimmed.
If 'pad', NA will be padded.
side : 'left' or 'right' or mapping from dimension to 'left' or 'right'
coord_func : function (name) or mapping from coordinate name to function (name).
coord_func : function, str name of function, or mapping from coordinate name to function or str name of func.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
coord_func : function, str name of function, or mapping from coordinate name to function or str name of func.
coord_func : Callable, str name of function, or mapping from coordinate name to Callable or str name of func.

Also probably more readable that way because you don't have 4 times "function" in one sentence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add an explicit list of function names

Function to use to reduce the coordinate values of one block down to a single new label.

Can be specified as a custom function, either by passing a callable (e.g. ``np.max``) or passing a string
name of a reduction function supplied by xarray (e.g. ``'min'``). If passed as a callable it should be a
valid argument to xarray's ``.reduce`` method. The advantage of specifying as a string is automatic handling
of NaNs and non-numpy array types. Default is to use "mean" for all coarsened dimensions.

Returns
-------
Expand Down
40 changes: 34 additions & 6 deletions xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,14 +61,17 @@
if TYPE_CHECKING:
from xarray.core.parallelcompat import ChunkManagerEntrypoint
from xarray.core.types import (
CoarsenBoundaryOptions,
Dims,
ErrorOptionsWithWarn,
PadModeOptions,
PadReflectOptions,
QuantileMethods,
SideOptions,
T_Variable,
)


NON_NANOSECOND_WARNING = (
"Converting non-nanosecond precision {case} values to nanosecond precision. "
"This behavior can eventually be relaxed in xarray, as it is an artifact from "
Expand Down Expand Up @@ -2494,10 +2497,29 @@ def rolling_window(
)

def coarsen(
self, windows, func, boundary="exact", side="left", keep_attrs=None, **kwargs
self,
windows: Mapping[Any, int],
func: str | Callable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func: str | Callable,
func: Literal["max", "min", ...] | Callable,

boundary: CoarsenBoundaryOptions = "exact",
side: SideOptions | Mapping[Any, SideOptions] = "left",
keep_attrs=None,
**kwargs,
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While at it you could add a return type.

"""
Apply reduction function.

Parameters
----------
windows : mapping of hashable to int
A mapping from the name of the dimension to create the coarsened block along (e.g. `time`) to the size of
the coarsened block.
func : function or str name of function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func : function or str name of function
func : Callable or str name of function

Function to use to reduce the values of one block down along one or more axes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention the signature of the function.
I myself have already run into the problem if not knowing what exactly you have to supply.

Also, I never know from which "pool" you can choose the str names, maybe this can be documented better? (Same for the other docstrings)

Copy link
Member Author

@TomNicholas TomNicholas Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I never know from which "pool" you can choose the str names, maybe this can be documented better? (Same for the other docstrings)

Yeah this seems like an issue. It's currently literally just any function in the internal duck_array_ops module, which (a) is hacky AF, (b) is not documented or even very easy to document, and (c) could fail (there are fn's in duck_array_ops that aren't reductions, for example). I'm not aware of anywhere else in xarray's codebase where we just implicitly allow attempting to get any function from duck_array_ops like this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add a public variable in the duck_arr_ops module that collects valid reduction names and add a reference to it?

boundary : {"exact", "trim", "pad"}
If 'exact', a ValueError will be raised if dimension size is not a
multiple of window size. If 'trim', the excess indexes are trimmed.
If 'pad', NA will be padded.
side : 'left' or 'right' or mapping from dimension to 'left' or 'right'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add **kwargs as being passed to func.

"""
windows = {k: v for k, v in windows.items() if k in self.dims}

Expand All @@ -2514,12 +2536,18 @@ def coarsen(

reshaped, axes = self.coarsen_reshape(windows, boundary, side)
if isinstance(func, str):
name = func
func = getattr(duck_array_ops, name, None)
if func is None:
raise NameError(f"{name} is not a valid method.")
try:
callable_func = getattr(duck_array_ops, func)
except AttributeError:
raise NameError(
f"{func} is not a valid xarray reduction method, so cannot be used to coarsen"
)
else:
callable_func = func

return self._replace(data=func(reshaped, axis=axes, **kwargs), attrs=_attrs)
return self._replace(
data=callable_func(reshaped, axis=axes, **kwargs), attrs=_attrs
)

def coarsen_reshape(self, windows, boundary, side):
"""
Expand Down