You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* draft updates
* discuss array API standard
* fix sparse examples so they run
* Deepak's suggestions
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* link to duck arrays user guide from internals page
* fix various links
* itemized list
* mention dispatching on functions not in the array API standard
* examples of duckarrays
* add intended audience to xarray internals section
* move paragraph on why its called a duck array upwards
* delete section on numpy ufuncs
* explain difference between .values and to_numpy
* strongly prefer to_numpy over values
* recommend to_numpy instead of values in the how do I? page
* clearer about using to_numpy
* merge section on missing features
* remove todense from examples
* whatsnew
* double that
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* numpy array class clarification
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* Remove sentence about xarray's internals
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* array API standard
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* proper link for sparse.COO type
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* links to docstrings of array types
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* don't put variable in parentheses
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* double backquote formatting
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* better bracketing
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* fix list formatting
* add links to glue packages, dask, and cubed
* link to todense method
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* link to numpy-like arrays page
* link to numpy ufunc docs
* add example of using .to_numpy
* show example of .values failing
* move whatsnew entry to unreleased version
* fix warning in docs build
* trigger CI
---------
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Copy file name to clipboardExpand all lines: doc/internals/duck-arrays-integration.rst
+38-6Lines changed: 38 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -1,23 +1,55 @@
1
1
2
-
.. _internals.duck_arrays:
2
+
.. _internals.duckarrays:
3
3
4
4
Integrating with duck arrays
5
5
=============================
6
6
7
7
.. warning::
8
8
9
-
This is a experimental feature.
9
+
This is an experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker <https://github.com/pydata/xarray/issues>`_.
10
10
11
-
Xarray can wrap custom :term:`duck array` objects as long as they define numpy's
12
-
``shape``, ``dtype`` and ``ndim`` properties and the ``__array__``,
13
-
``__array_ufunc__`` and ``__array_function__`` methods.
11
+
Xarray can wrap custom numpy-like arrays (":term:`duck array`\s") - see the :ref:`user guide documentation <userguide.duckarrays>`.
12
+
This page is intended for developers who are interested in wrapping a new custom array type with xarray.
13
+
14
+
Duck array requirements
15
+
~~~~~~~~~~~~~~~~~~~~~~~
16
+
17
+
Xarray does not explicitly check that required methods are defined by the underlying duck array object before
18
+
attempting to wrap the given array. However, a wrapped array type should at a minimum define these attributes:
19
+
20
+
* ``shape`` property,
21
+
* ``dtype`` property,
22
+
* ``ndim`` property,
23
+
* ``__array__`` method,
24
+
* ``__array_ufunc__`` method,
25
+
* ``__array_function__`` method.
26
+
27
+
These need to be defined consistently with :py:class:`numpy.ndarray`, for example the array ``shape``
28
+
property needs to obey `numpy's broadcasting rules <https://numpy.org/doc/stable/user/basics.broadcasting.html>`_
29
+
(see also the `Python Array API standard's explanation <https://data-apis.org/array-api/latest/API_specification/broadcasting.html>`_
30
+
of these same rules).
31
+
32
+
Python Array API standard support
33
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
34
+
35
+
As an integration library xarray benefits greatly from the standardization of duck-array libraries' APIs, and so is a
36
+
big supporter of the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_. .
37
+
38
+
We aim to support any array libraries that follow the Array API standard out-of-the-box. However, xarray does occasionally
39
+
call some numpy functions which are not (yet) part of the standard (e.g. :py:meth:`xarray.DataArray.pad` calls :py:func:`numpy.pad`).
40
+
See `xarray issue #7848 <https://github.com/pydata/xarray/issues/7848>`_ for a list of such functions. We can still support dispatching on these functions through
41
+
the array protocols above, it just means that if you exclusively implement the methods in the Python Array API standard
42
+
then some features in xarray will not work.
43
+
44
+
Custom inline reprs
45
+
~~~~~~~~~~~~~~~~~~~
14
46
15
47
In certain situations (e.g. when printing the collapsed preview of
16
48
variables of a ``Dataset``), xarray will display the repr of a :term:`duck array`
17
49
in a single line, truncating it to a certain number of characters. If that
18
50
would drop too much information, the :term:`duck array` may define a
19
51
``_repr_inline_`` method that takes ``max_width`` (number of characters) as an
Copy file name to clipboardExpand all lines: doc/internals/index.rst
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,12 @@ stack, NumPy and pandas. It is written in pure Python (no C or Cython
8
8
extensions), which makes it easy to develop and extend. Instead, we push
9
9
compiled code to :ref:`optional dependencies<installing>`.
10
10
11
+
The pages in this section are intended for:
12
+
13
+
* Contributors to xarray who wish to better understand some of the internals,
14
+
* Developers who wish to extend xarray with domain-specific logic, perhaps to support a new scientific community of users,
15
+
* Developers who wish to interface xarray with their existing tooling, e.g. by creating a plugin for reading a new file format, or wrapping a custom array type.
NumPy-like arrays (often known as :term:`duck array`\s) are drop-in replacements for the :py:class:`numpy.ndarray`
9
+
class but with different features, such as propagating physical units or a different layout in memory.
10
+
Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the
11
+
additional features of these array libraries.
12
+
13
+
Some numpy-like array types that xarray already has some support for:
14
+
15
+
* `Cupy <https://cupy.dev/>`_ - GPU support (see `cupy-xarray <https://cupy-xarray.readthedocs.io>`_),
16
+
* `Sparse <https://sparse.pydata.org/en/stable/>`_ - for performant arrays with many zero elements,
17
+
* `Pint <https://pint.readthedocs.io/en/latest/>`_ - for tracking the physical units of your data (see `pint-xarray <https://pint-xarray.readthedocs.io>`_),
18
+
* `Dask <https://docs.dask.org/en/stable/>`_ - parallel computing on larger-than-memory arrays (see :ref:`using dask with xarray <dask>`),
19
+
* `Cubed <https://github.com/tomwhite/cubed/tree/main/cubed>`_ - another parallel computing framework that emphasises reliability (see `cubed-xarray <https://github.com/cubed-xarray>`_).
20
+
6
21
.. warning::
7
22
8
-
This feature should be considered experimental. Please report any bug you may find on
9
-
xarray’s github repository.
23
+
This feature should be considered somewhat experimental. Please report any bugs you find on
For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that
29
+
described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require
30
+
slightly different user code (e.g. calling ``.chunk`` or ``.compute``).
31
+
32
+
Why "duck"?
33
+
-----------
34
+
35
+
Why is it also called a "duck" array? This comes from a common statement of object-oriented programming -
36
+
"If it walks like a duck, and quacks like a duck, treat it like a duck". In other words, a library like xarray that
37
+
is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is
38
+
permitted (e.g. ``if dask``, ``if numpy``, ``if sparse`` etc.). Instead xarray can take the more permissive approach of simply
39
+
treating the wrapped array as valid, attempting to call the relevant methods (e.g. ``.mean()``) and only raising an
40
+
error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows
41
+
objects and classes from different libraries to work together more easily.
42
+
43
+
What is a numpy-like array?
44
+
---------------------------
45
+
46
+
A "numpy-like array" (also known as a "duck array") is a class that contains array-like data, and implements key
47
+
numpy-like functionality such as indexing, broadcasting, and computation methods.
48
+
49
+
For example, the `sparse <https://sparse.pydata.org/en/stable/>`_ library provides a sparse array type which is useful for representing nD array objects like sparse matrices
50
+
in a memory-efficient manner. We can create a sparse array object (of the :py:class:`sparse.COO` type) from a numpy array like this:
51
+
52
+
.. ipython:: python
53
+
54
+
from sparse importCOO
55
+
56
+
x = np.eye(4, dtype=np.uint8) # create diagonal identity matrix
57
+
s =COO.from_numpy(x)
58
+
s
10
59
11
-
NumPy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with
12
-
additional features, like propagating physical units or a different layout in memory.
60
+
This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements.
61
+
This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices).
62
+
Sparse array objects can be converted back to a "dense" numpy array by calling :py:meth:`sparse.COO.todense`.
13
63
14
-
:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as
15
-
long as they satisfy certain conditions (see :ref:`internals.duck_arrays`).
64
+
Just like :py:class:`numpy.ndarray` objects, :py:class:`sparse.COO` arrays support indexing
65
+
66
+
.. ipython:: python
67
+
68
+
s[1, 1] # diagonal elements should be ones
69
+
s[2, 3] # off-diagonal elements should be zero
70
+
71
+
broadcasting,
72
+
73
+
.. ipython:: python
74
+
75
+
x2 = np.zeros(
76
+
(4, 1), dtype=np.uint8
77
+
) # create second sparse array of different shape
78
+
s2 =COO.from_numpy(x2)
79
+
(s * s2) # multiplication requires broadcasting
80
+
81
+
and various computation methods
82
+
83
+
.. ipython:: python
84
+
85
+
s.sum(axis=1)
86
+
87
+
This numpy-like array also supports calling so-called `numpy ufuncs <https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs>`_
88
+
("universal functions") on it directly:
89
+
90
+
.. ipython:: python
91
+
92
+
np.sum(s, axis=1)
93
+
94
+
95
+
Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the
96
+
equivalent numpy array - this is the sense in which the sparse array is "numpy-like".
16
97
17
98
.. note::
18
99
19
-
For ``dask`` support see :ref:`dask`.
100
+
For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duckarrays`.
101
+
102
+
Wrapping numpy-like arrays in xarray
103
+
------------------------------------
104
+
105
+
:py:class:`DataArray`, :py:class:`Dataset`, and :py:class:`Variable` objects can wrap these numpy-like arrays.
20
106
107
+
Constructing xarray objects which wrap numpy-like arrays
Most of the API does support :term:`duck array` objects, but there are a few areas where
25
-
the code will still cast to ``numpy`` arrays:
110
+
The primary way to create an xarray object which wraps a numpy-like array is to pass that numpy-like array instance directly
111
+
to the constructor of the xarray class. The :ref:`page on xarray data structures <data structures>` shows how :py:class:`DataArray` and :py:class:`Dataset`
112
+
both accept data in various forms through their ``data`` argument, but in fact this data can also be any wrappable numpy-like array.
26
113
27
-
- dimension coordinates, and thus all indexing operations:
114
+
For example, we can wrap the sparse array we created earlier inside a new DataArray object:
115
+
116
+
.. ipython:: python
117
+
118
+
s_da = xr.DataArray(s, dims=["i", "j"])
119
+
s_da
120
+
121
+
We can see what's inside - the printable representation of our xarray object (the repr) automatically uses the printable
122
+
representation of the underlying wrapped array.
123
+
124
+
Of course our sparse array object is still there underneath - it's stored under the ``.data`` attribute of the dataarray:
125
+
126
+
.. ipython:: python
127
+
128
+
s_da.data
129
+
130
+
Array methods
131
+
~~~~~~~~~~~~~
132
+
133
+
We saw above that numpy-like arrays provide numpy methods. Xarray automatically uses these when you call the corresponding xarray method:
134
+
135
+
.. ipython:: python
136
+
137
+
s_da.sum(dim="j")
138
+
139
+
Converting wrapped types
140
+
~~~~~~~~~~~~~~~~~~~~~~~~
141
+
142
+
If you want to change the type inside your xarray object you can use :py:meth:`DataArray.as_numpy`:
143
+
144
+
.. ipython:: python
145
+
146
+
s_da.as_numpy()
147
+
148
+
This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array.
149
+
150
+
If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or
151
+
:py:meth:`DataArray.values`, where the former is strongly preferred. The difference is in the way they coerce to numpy - :py:meth:`~DataArray.values`
152
+
always uses :py:func:`numpy.asarray` which will fail for some array types (e.g. ``cupy``), whereas :py:meth:`~DataArray.to_numpy`
153
+
uses the correct method depending on the array type.
154
+
155
+
.. ipython:: python
156
+
157
+
s_da.to_numpy()
158
+
159
+
.. ipython:: python
160
+
:okexcept:
161
+
162
+
s_da.values
163
+
164
+
This illustrates the difference between :py:meth:`~DataArray.data` and :py:meth:`~DataArray.values`,
165
+
which is sometimes a point of confusion for new xarray users.
166
+
Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas
167
+
:py:meth:`DataArray.values` converts the underlying array to a numpy array before returning it.
168
+
(This is another reason to use :py:meth:`~DataArray.to_numpy` over :py:meth:`~DataArray.values` - the intention is clearer.)
169
+
170
+
Conversion to numpy as a fallback
171
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
172
+
173
+
If a wrapped array does not implement the corresponding array method then xarray will often attempt to convert the
174
+
underlying array to a numpy array so that the operation can be performed. You may want to watch out for this behavior,
175
+
and report any instances in which it causes problems.
176
+
177
+
Most of xarray's API does support using :term:`duck array` objects, but there are a few areas where
178
+
the code will still convert to ``numpy`` arrays:
179
+
180
+
- Dimension coordinates, and thus all indexing operations:
28
181
29
182
* :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel`
30
183
* :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc`
@@ -33,7 +186,7 @@ the code will still cast to ``numpy`` arrays:
33
186
:py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in
34
187
data variables and non-dimension coordinates won't be casted
35
188
36
-
- functions and methods that depend on external libraries or features of ``numpy`` not
189
+
- Functions and methods that depend on external libraries or features of ``numpy`` not
37
190
covered by ``__array_function__`` / ``__array_ufunc__``:
38
191
39
192
* :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``)
@@ -49,17 +202,25 @@ the code will still cast to ``numpy`` arrays:
49
202
:py:class:`numpy.vectorize`)
50
203
* :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`)
51
204
52
-
- incompatibilities between different :term:`duck array` libraries:
205
+
- Incompatibilities between different :term:`duck array` libraries:
53
206
54
207
* :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was
55
208
not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should
56
-
wrap the new ``dask`` array; changing the chunk sizes works.
57
-
209
+
wrap the new ``dask`` array; changing the chunk sizes works however.
58
210
59
211
Extensions using duck arrays
60
212
----------------------------
61
-
Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays
62
-
easier:
213
+
214
+
Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also
215
+
makes sense to use an interfacing package to make certain tasks easier.
216
+
217
+
For example the `pint-xarray package <https://pint-xarray.readthedocs.io>`_ offers a custom ``.pint`` accessor (see :ref:`internals.accessors`) which provides
218
+
convenient access to information stored within the wrapped array (e.g. ``.units`` and ``.magnitude``), and makes makes
219
+
creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user.
220
+
221
+
We maintain a list of libraries extending ``xarray`` to make working with particular wrapped duck arrays
222
+
easier. If you know of more that aren't on this list please raise an issue to add them!
0 commit comments