From 0217fe37acdea556c431d765bb70bc3f1330888a Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 12 Jun 2023 15:01:57 -0400 Subject: [PATCH 01/63] draft updates --- doc/user-guide/duckarrays.rst | 179 +++++++++++++++++++++++++++++++--- 1 file changed, 165 insertions(+), 14 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 78c7d1e572a..5b2b0330a54 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -3,28 +3,171 @@ Working with numpy-like arrays ============================== +NumPy-like arrays (often known as :term:`duck array`s) are drop-in replacements for the :py:class:`numpy.ndarray` +class but with different features, such as propagating physical units or a different layout in memory. +Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the +additional features of these array libraries. + .. warning:: - This feature should be considered experimental. Please report any bug you may find on + This feature should be considered somewhat experimental. Please report any bugs you find on xarray’s github repository. -NumPy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with -additional features, like propagating physical units or a different layout in memory. +.. note:: + + For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that + described on this page, chunked array types like `dask.array.Array` implement additional methods that require + slightly different user code (e.g. calling ``.chunk`` or ``.compute``). + +What is a numpy-like array? +--------------------------- + +A "numpy-like array" (also known as a "duck array") is a class that contains array-like data, and implements key +numpy-like functionality such as indexing, broadcasting, and computation methods. + +For example, the ``sparse`` library provides a sparse array type which is useful for representing ``sparse matrices`` +in a memory-efficient manner. We can create a sparse array object (of the ``sparse.COO`` type) from a numpy array like this: + +.. ipython:: python + + import sparse + + x = np.eye(4, dtype=np.uint8) + s = COO.from_numpy(x) + s + +This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements. +This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices). +It does mean that in order to clearly see what is stored in our sparse array object we have to convert it back to a +"dense" array using ``.todense``: + +.. ipython:: python + + s.todense() + +Just like `numpy.ndarray` objects, `sparse.COO` arrays support indexing + +.. ipython:: python + + s[2, 3] = 5 + s -:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as -long as they satisfy certain conditions (see :ref:`internals.duck_arrays`). +broadcasting, + +.. ipython:: python + + x3 = np.zeros((4, 1), dtype=np.uint8) + x3[2, 0] = 1 + s3 = COO.from_numpy(x3) + (s * s3).todense() + +and various computation methods + +.. ipython:: python + + s.sum(axis=1).todense() + +This numpy-like array also supports calling so-called numpy ufuncs (link to numpy docs) on it directly: + +.. ipython:: python + + np.sum(s, axis=1).todense() + + +Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the +equivalent numpy array - this is the sense in which the sparse array is "numpy-like". + +Why is it also called a "duck" array, you might ask? This comes from a common statement in object-oriented programming - +"If it walks like a duck, and quacks like a duck, treat it like a duck". In other words, a library like xarray that +is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is +permitted (e.g. `if dask`, `if numpy`, `if sparse` etc.). Instead xarray can take the more permissive approach of simply +treating the wrapped array as valid, attempting to call the relevant methods (e.g. `.mean()`) and only raising an +error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows +objects and classes from different libraries to work together more easily. .. note:: - For ``dask`` support see :ref:`dask`. + For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duck_arrays`. + +Wrapping numpy-like arrays in xarray +------------------------------------ + +:py:class:`DataArray` and :py:class:`Dataset` (and :py:class:`Variable`) objects can wrap these numpy-like arrays. + +Constructing xarray objects which wrap numpy-like arrays +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The primary way to create an xarray object which wraps a numpy-like array is to pass that numpy-like array instance directly +to the constructor of the xarray class. The page on xarray data structures shows how :py:class:`DataArray` and :py:class:`Dataset` +both accept data in various forms through their ``data`` argument, but in fact this data can also be any wrappable numpy-like array. + +For example, we can wrap the sparse array we created earlier inside a new DataArray object: + +.. ipython:: python + + s_da = xr.DataArray(s2, dims=["x", "y"]) + s_da + +We can see what's inside - the printable representation of our xarray object (the repr) automatically uses the printable +representation of the underlying wrapped array. + +Of course our sparse array object is still there underneath - it's stored under the `.data` attribute of the dataarray: + +.. ipython:: python + + s_da.data + +Array methods +~~~~~~~~~~~~~ + +We saw above that numpy-like arrays provide numpy methods. Xarray automatically uses these when you call the corresponding xarray method: + +.. ipython:: python + + s_da.sum(dim="y") + +Numpy ufuncs +~~~~~~~~~~~~ + +Xarray objects support calling numpy functions direction on the xarray objects, e.g. ``np.func(da)``. +This also works when wrapping numpy-like arrays: + +.. ipython:: python + + np.sum(s_da, axis=1) + +Converting wrapped types +~~~~~~~~~~~~~~~~~~~~~~~~ + +If you want to change the type inside your xarray object you can use :py:meth:`DataArray.as_numpy`: + +.. ipython:: python + + s_da.as_numpy() + +This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array. + +If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or +:py:meth:`DataArray.values` (what is the difference here?). + +This illustrates the difference between `.values` and `.data`, which is sometimes a point of confusion for new xarray users. +:py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas :py:meth:`DataArray.values` +converts the underlying array to a numpy array before returning it. + +Conversion to numpy as a fallback +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If a wrapped array does not implement the corresponding array method then xarray will often attempt to convert the +underlying array to a numpy array so that the operation can be performed. You may want to watch out for this behavior, +and report any instances in which it causes problems. Missing features ---------------- -Most of the API does support :term:`duck array` objects, but there are a few areas where -the code will still cast to ``numpy`` arrays: -- dimension coordinates, and thus all indexing operations: +Most of xarray's API does support using :term:`duck array` objects, but there are a few areas where +the code will still convert to ``numpy`` arrays: + +- Dimension coordinates, and thus all indexing operations: * :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel` * :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc` @@ -33,7 +176,7 @@ the code will still cast to ``numpy`` arrays: :py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in data variables and non-dimension coordinates won't be casted -- functions and methods that depend on external libraries or features of ``numpy`` not +- Functions and methods that depend on external libraries or features of ``numpy`` not covered by ``__array_function__`` / ``__array_ufunc__``: * :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``) @@ -49,17 +192,25 @@ the code will still cast to ``numpy`` arrays: :py:class:`numpy.vectorize`) * :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`) -- incompatibilities between different :term:`duck array` libraries: +- Incompatibilities between different :term:`duck array` libraries: * :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should wrap the new ``dask`` array; changing the chunk sizes works. - Extensions using duck arrays ---------------------------- -Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays -easier: + +Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also +makes sense to use an interfacing package to make certain tasks easier. + +For example the ``pint-xarray`` package offers a custom `.pint` accessor (link to accessors docs) which provides +convenient access to information stored within the wrapped array (e.g. `.units` and `.magnitude`), and makes makes +creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user. + +We maintain a list of libraries extending ``xarray`` to make working with particular wrapped duck arrays +easier. If you know of more that aren't on this list please raise an issue to add them! - `pint-xarray `_ - `cupy-xarray `_ +- `cubed-xarray `_ From 5a221bbd2b793457c1fde88d778b4553ce113187 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 12 Jun 2023 17:34:33 -0400 Subject: [PATCH 02/63] discuss array API standard --- doc/internals/duck-arrays-integration.rst | 30 +++++++++++++++++++---- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index d403328aa2f..39fd2b33943 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -6,18 +6,38 @@ Integrating with duck arrays .. warning:: - This is a experimental feature. + This is a experimental feature. Please report any bugs or other difficulties on xarray's issue tracker. -Xarray can wrap custom :term:`duck array` objects as long as they define numpy's -``shape``, ``dtype`` and ``ndim`` properties and the ``__array__``, -``__array_ufunc__`` and ``__array_function__`` methods. +Xarray can wrap custom numpy-like arrays (":term:`duck array`s") - see the user guide documentation. + +Duck array requirements +~~~~~~~~~~~~~~~~~~~~~~~ + +Xarray does not explicitly check that that required methods are defined by the underlying duck array object before +attempting to wrap the given array. However, a wrapped array type should at a minimum support numpy's ``shape``, +``dtype`` and ``ndim`` properties, as well as the ``__array__``, ``__array_ufunc__`` and ``__array_function__`` methods. +The array ``shape`` property needs to obey numpy's broadcasting rules. + +Python Array API standard support +================================= + +As an integration library xarray benefits greatly from the standardization of duck-array libraries' APIs, and so is a +big supporter of the python Array API Standard (link). In fact the crystallization of different array libraries' APIs towards +the standard has already helped xarray remove a lot of internal adapter code. + +As such, we aim to support any array librarie that follows the standard out-of-the-box. However, xarray does occasionally +call some numpy functions which are not (yet) part of the standard (e.g. :py:class:`DataArray.pad` calls `np.pad`, +). (link to issue) + +Custom inline reprs +~~~~~~~~~~~~~~~~~~~ In certain situations (e.g. when printing the collapsed preview of variables of a ``Dataset``), xarray will display the repr of a :term:`duck array` in a single line, truncating it to a certain number of characters. If that would drop too much information, the :term:`duck array` may define a ``_repr_inline_`` method that takes ``max_width`` (number of characters) as an -argument: +argument .. code:: python From 1971da4a052eb1041ab8b8d7e2a3801806c8fd24 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 13 Jun 2023 12:49:25 -0400 Subject: [PATCH 03/63] fix sparse examples so they run --- doc/user-guide/duckarrays.rst | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 5b2b0330a54..85d96eacebd 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -30,9 +30,9 @@ in a memory-efficient manner. We can create a sparse array object (of the ``spar .. ipython:: python - import sparse + from sparse import COO - x = np.eye(4, dtype=np.uint8) + x = np.eye(4, dtype=np.uint8) # create diagonal identity matrix s = COO.from_numpy(x) s @@ -49,17 +49,18 @@ Just like `numpy.ndarray` objects, `sparse.COO` arrays support indexing .. ipython:: python - s[2, 3] = 5 - s + s[1, 1] # diagonal elements should be ones + s[2, 3] # off-diagonal elements should be zero broadcasting, .. ipython:: python - x3 = np.zeros((4, 1), dtype=np.uint8) - x3[2, 0] = 1 - s3 = COO.from_numpy(x3) - (s * s3).todense() + x2 = np.zeros( + (4, 1), dtype=np.uint8 + ) # create second sparse array of different shape + s2 = COO.from_numpy(x2) + (s * s2).todense() # multiplication requires broadcasting and various computation methods @@ -105,7 +106,7 @@ For example, we can wrap the sparse array we created earlier inside a new DataAr .. ipython:: python - s_da = xr.DataArray(s2, dims=["x", "y"]) + s_da = xr.DataArray(s, dims=["i", "j"]) s_da We can see what's inside - the printable representation of our xarray object (the repr) automatically uses the printable @@ -124,7 +125,7 @@ We saw above that numpy-like arrays provide numpy methods. Xarray automatically .. ipython:: python - s_da.sum(dim="y") + s_da.sum(dim="j") Numpy ufuncs ~~~~~~~~~~~~ From fa58fff9d13010e4cb829543eefdd7411c637035 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 14 Jun 2023 12:10:59 -0400 Subject: [PATCH 04/63] Deepak's suggestions Co-authored-by: Deepak Cherian --- doc/internals/duck-arrays-integration.rst | 4 ++-- doc/user-guide/duckarrays.rst | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index 39fd2b33943..aa2193ba2c7 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -6,7 +6,7 @@ Integrating with duck arrays .. warning:: - This is a experimental feature. Please report any bugs or other difficulties on xarray's issue tracker. + This is an experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker `_. Xarray can wrap custom numpy-like arrays (":term:`duck array`s") - see the user guide documentation. @@ -25,7 +25,7 @@ As an integration library xarray benefits greatly from the standardization of du big supporter of the python Array API Standard (link). In fact the crystallization of different array libraries' APIs towards the standard has already helped xarray remove a lot of internal adapter code. -As such, we aim to support any array librarie that follows the standard out-of-the-box. However, xarray does occasionally +We aim to support any array libraries that follows the standard out-of-the-box. However, xarray does occasionally call some numpy functions which are not (yet) part of the standard (e.g. :py:class:`DataArray.pad` calls `np.pad`, ). (link to issue) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 85d96eacebd..ea03bbae23a 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -16,7 +16,7 @@ additional features of these array libraries. .. note:: For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that - described on this page, chunked array types like `dask.array.Array` implement additional methods that require + described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require slightly different user code (e.g. calling ``.chunk`` or ``.compute``). What is a numpy-like array? @@ -25,7 +25,7 @@ What is a numpy-like array? A "numpy-like array" (also known as a "duck array") is a class that contains array-like data, and implements key numpy-like functionality such as indexing, broadcasting, and computation methods. -For example, the ``sparse`` library provides a sparse array type which is useful for representing ``sparse matrices`` +For example, the `sparse `_ library provides a sparse array type which is useful for representing nD array objects like sparse matrices in a memory-efficient manner. We can create a sparse array object (of the ``sparse.COO`` type) from a numpy array like this: .. ipython:: python @@ -205,7 +205,7 @@ Extensions using duck arrays Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also makes sense to use an interfacing package to make certain tasks easier. -For example the ``pint-xarray`` package offers a custom `.pint` accessor (link to accessors docs) which provides +For example the ``pint-xarray`` package offers a custom ``.pint`` accessor (link to accessors docs) which provides convenient access to information stored within the wrapped array (e.g. `.units` and `.magnitude`), and makes makes creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user. From 258dd54ecacf091f6a90b1831e7f52f8f48579b9 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 14 Jun 2023 15:46:32 -0400 Subject: [PATCH 05/63] link to duck arrays user guide from internals page --- doc/internals/duck-arrays-integration.rst | 9 +++++---- doc/user-guide/duckarrays.rst | 2 ++ 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index aa2193ba2c7..cd3d24281ef 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -1,5 +1,5 @@ -.. _internals.duck_arrays: +.. _internals.duckarrays: Integrating with duck arrays ============================= @@ -8,7 +8,8 @@ Integrating with duck arrays This is an experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker `_. -Xarray can wrap custom numpy-like arrays (":term:`duck array`s") - see the user guide documentation. +Xarray can wrap custom numpy-like arrays (":term:`duck array`\s") - see the :ref:`user guide documentation `. +This page is intended for developers who are interested in wrapping a custom array type with xarray. Duck array requirements ~~~~~~~~~~~~~~~~~~~~~~~ @@ -19,13 +20,13 @@ attempting to wrap the given array. However, a wrapped array type should at a mi The array ``shape`` property needs to obey numpy's broadcasting rules. Python Array API standard support -================================= +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As an integration library xarray benefits greatly from the standardization of duck-array libraries' APIs, and so is a big supporter of the python Array API Standard (link). In fact the crystallization of different array libraries' APIs towards the standard has already helped xarray remove a lot of internal adapter code. -We aim to support any array libraries that follows the standard out-of-the-box. However, xarray does occasionally +We aim to support any array libraries that follow the standard out-of-the-box. However, xarray does occasionally call some numpy functions which are not (yet) part of the standard (e.g. :py:class:`DataArray.pad` calls `np.pad`, ). (link to issue) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index ea03bbae23a..fdc625d18f1 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -1,5 +1,7 @@ .. currentmodule:: xarray +.. _userguide.duckarrays: + Working with numpy-like arrays ============================== From b26e7ac5af040a7ad4e27d2a8061c3f4874779b2 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 15 Jun 2023 09:18:09 -0400 Subject: [PATCH 06/63] fix various links --- doc/internals/duck-arrays-integration.rst | 4 ++-- doc/internals/extending-xarray.rst | 2 ++ doc/user-guide/duckarrays.rst | 10 +++++----- 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index cd3d24281ef..5dd99ed2ef7 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -9,7 +9,7 @@ Integrating with duck arrays This is an experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker `_. Xarray can wrap custom numpy-like arrays (":term:`duck array`\s") - see the :ref:`user guide documentation `. -This page is intended for developers who are interested in wrapping a custom array type with xarray. +This page is intended for developers who are interested in wrapping a new custom array type with xarray. Duck array requirements ~~~~~~~~~~~~~~~~~~~~~~~ @@ -23,7 +23,7 @@ Python Array API standard support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As an integration library xarray benefits greatly from the standardization of duck-array libraries' APIs, and so is a -big supporter of the python Array API Standard (link). In fact the crystallization of different array libraries' APIs towards +big supporter of the `Python Array API Standard `_. In fact the crystallization of different array libraries' APIs towards the standard has already helped xarray remove a lot of internal adapter code. We aim to support any array libraries that follow the standard out-of-the-box. However, xarray does occasionally diff --git a/doc/internals/extending-xarray.rst b/doc/internals/extending-xarray.rst index 56aeb8fa462..a180b85044f 100644 --- a/doc/internals/extending-xarray.rst +++ b/doc/internals/extending-xarray.rst @@ -1,4 +1,6 @@ +.. _internals.accessors: + Extending xarray using accessors ================================ diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index fdc625d18f1..5dce80d3072 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -5,7 +5,7 @@ Working with numpy-like arrays ============================== -NumPy-like arrays (often known as :term:`duck array`s) are drop-in replacements for the :py:class:`numpy.ndarray` +NumPy-like arrays (often known as :term:`duck array`\s) are drop-in replacements for the :py:class:`numpy.ndarray` class but with different features, such as propagating physical units or a different layout in memory. Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the additional features of these array libraries. @@ -90,7 +90,7 @@ objects and classes from different libraries to work together more easily. .. note:: - For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duck_arrays`. + For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duckarrays`. Wrapping numpy-like arrays in xarray ------------------------------------ @@ -101,7 +101,7 @@ Constructing xarray objects which wrap numpy-like arrays ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The primary way to create an xarray object which wraps a numpy-like array is to pass that numpy-like array instance directly -to the constructor of the xarray class. The page on xarray data structures shows how :py:class:`DataArray` and :py:class:`Dataset` +to the constructor of the xarray class. The :ref:`page on xarray data structures ` shows how :py:class:`DataArray` and :py:class:`Dataset` both accept data in various forms through their ``data`` argument, but in fact this data can also be any wrappable numpy-like array. For example, we can wrap the sparse array we created earlier inside a new DataArray object: @@ -207,8 +207,8 @@ Extensions using duck arrays Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also makes sense to use an interfacing package to make certain tasks easier. -For example the ``pint-xarray`` package offers a custom ``.pint`` accessor (link to accessors docs) which provides -convenient access to information stored within the wrapped array (e.g. `.units` and `.magnitude`), and makes makes +For example the `pint-xarray package `_ offers a custom ``.pint`` accessor (see :ref:`internals.accessors`) which provides +convenient access to information stored within the wrapped array (e.g. ``.units`` and ``.magnitude``), and makes makes creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user. We maintain a list of libraries extending ``xarray`` to make working with particular wrapped duck arrays From ad818115e847ac81bf840563deefdef26e580b5a Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 15 Jun 2023 09:34:52 -0400 Subject: [PATCH 07/63] itemized list --- doc/internals/duck-arrays-integration.rst | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index 5dd99ed2ef7..6ee7d3c22b4 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -15,9 +15,19 @@ Duck array requirements ~~~~~~~~~~~~~~~~~~~~~~~ Xarray does not explicitly check that that required methods are defined by the underlying duck array object before -attempting to wrap the given array. However, a wrapped array type should at a minimum support numpy's ``shape``, -``dtype`` and ``ndim`` properties, as well as the ``__array__``, ``__array_ufunc__`` and ``__array_function__`` methods. -The array ``shape`` property needs to obey numpy's broadcasting rules. +attempting to wrap the given array. However, a wrapped array type should at a minimum define these attributes: + +* ``shape`` property, +* ``dtype`` property, +* ``ndim`` property, +* ``__array__`` method, +* ``__array_ufunc__`` method, +* ``__array_function__`` method. + +These need to be defined consistently with numpy :py:class:`numpy.ndarray`, for example the array ``shape`` +property needs to obey `numpy's broadcasting rules `_ +(see also the `Python Array API standard's explanation `_ +of these same rules). Python Array API standard support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 99394a3417f302459f9d2151bdf6660eb2db2c81 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 15 Jun 2023 09:44:49 -0400 Subject: [PATCH 08/63] mention dispatching on functions not in the array API standard --- doc/internals/duck-arrays-integration.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index 6ee7d3c22b4..f7c4e9a2c04 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -37,8 +37,10 @@ big supporter of the `Python Array API Standard `_ for a list of such functions. We can still support dispatching on these functions through +the array protocols above, it just means that if you exclusively implement the methods in the Python Array API standard +then some features in xarray will not work. Custom inline reprs ~~~~~~~~~~~~~~~~~~~ From c93f14368f1c4cc5b4cd20f107476fb94ec908f4 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 21 Jun 2023 13:13:56 -0400 Subject: [PATCH 09/63] examples of duckarrays --- doc/user-guide/duckarrays.rst | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 5dce80d3072..a6143e4b675 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -10,10 +10,16 @@ class but with different features, such as propagating physical units or a diffe Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the additional features of these array libraries. +Some numpy-like array types that xarray already has some support for: + +* `Cupy `_ - GPU support, +* `Sparse `_ - for performant arrays with many zero elements, +* `Pint `_ - for tracking the physical units of your data. + .. warning:: This feature should be considered somewhat experimental. Please report any bugs you find on - xarray’s github repository. + `xarray’s issue tracker `_. .. note:: From b6279fdb6091d3b9b220435644e3bf6e853aeb69 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 21 Jun 2023 13:15:09 -0400 Subject: [PATCH 10/63] add intended audience to xarray internals section --- doc/internals/index.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/internals/index.rst b/doc/internals/index.rst index e4ca9779dd7..25671877297 100644 --- a/doc/internals/index.rst +++ b/doc/internals/index.rst @@ -8,6 +8,11 @@ stack, NumPy and pandas. It is written in pure Python (no C or Cython extensions), which makes it easy to develop and extend. Instead, we push compiled code to :ref:`optional dependencies`. +The pages in this section are intended for: +- Contributors to xarray who wish to better understand some of the internals, +- Developers who wish to extend xarray with domain-specific logic, perhaps to support a new scientific community of users, +- Developers who wish to interface xarray with their existing tooling, e.g. by creating a plugin for reading a new file format, or wrapping a custom array type. + .. toctree:: :maxdepth: 2 From e0bd0497095e794eac554852d562e2b6e4f91278 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 21 Jun 2023 13:17:14 -0400 Subject: [PATCH 11/63] draft page on chunked arrays --- doc/internals/chunked-arrays.rst | 56 ++++++++++++++++++++++++++++++++ doc/internals/index.rst | 1 + 2 files changed, 57 insertions(+) create mode 100644 doc/internals/chunked-arrays.rst diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst new file mode 100644 index 00000000000..9005a0135ea --- /dev/null +++ b/doc/internals/chunked-arrays.rst @@ -0,0 +1,56 @@ + +.. _internals.chunkedarrays: + +Alternative chunked array types +=============================== + +.. warning:: + + This is a *highly* experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker `_. + In particular see discussion on `xarray issue #6807 `_ + +Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface. +This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands. + +The basic idea is that by wrapping an array that has an explicit notion of ``chunks``, xarray can expose control over +the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually +implements the handling of processing all of the chunks. + + +Chunked array requirements +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A chunked array Needs to meet all the :ref:`requirements for normal duck arrays `, but should also + +- ``.chunk`` +- ``.rechunk`` +- ``.compute`` + + +Chunked operations as function primitives +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Actual full list is defined in the :py:class:``~xarray.core.parallelcompat.ChunkManagerEntryPoint`` class (link to that API documentation) + + +ChunkManagerEntrypoint subclassing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autosummary:: + :toctree: generated/ + + xarray.core.parallelcompat.list_chunkmanagers + xarray.core.parallelcompat.ChunkManagerEntrypoint + + +User interface +~~~~~~~~~~~~~~ + +``chunked_array_type`` kwarg +``from_array_kwargs`` dict + + +Parallel processing without chunks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Don't necessarily need all this diff --git a/doc/internals/index.rst b/doc/internals/index.rst index 25671877297..49c19ccbd47 100644 --- a/doc/internals/index.rst +++ b/doc/internals/index.rst @@ -20,6 +20,7 @@ The pages in this section are intended for: variable-objects duck-arrays-integration + chunked-arrays extending-xarray zarr-encoding-spec how-to-add-new-backend From 0eea00bf5e52a8915781f673ef5b1a469362070f Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 27 Jun 2023 14:35:21 -0400 Subject: [PATCH 12/63] move paragraph on why its called a duck array upwards --- doc/user-guide/duckarrays.rst | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index a6143e4b675..c3d354cd98d 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -27,6 +27,17 @@ Some numpy-like array types that xarray already has some support for: described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require slightly different user code (e.g. calling ``.chunk`` or ``.compute``). +Why "duck"? +----------- + +Why is it also called a "duck" array? This comes from a common statement of object-oriented programming - +"If it walks like a duck, and quacks like a duck, treat it like a duck". In other words, a library like xarray that +is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is +permitted (e.g. ``if dask``, ``if numpy``, ``if sparse`` etc.). Instead xarray can take the more permissive approach of simply +treating the wrapped array as valid, attempting to call the relevant methods (e.g. ``.mean()``) and only raising an +error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows +objects and classes from different libraries to work together more easily. + What is a numpy-like array? --------------------------- @@ -86,14 +97,6 @@ This numpy-like array also supports calling so-called numpy ufuncs (link to nump Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the equivalent numpy array - this is the sense in which the sparse array is "numpy-like". -Why is it also called a "duck" array, you might ask? This comes from a common statement in object-oriented programming - -"If it walks like a duck, and quacks like a duck, treat it like a duck". In other words, a library like xarray that -is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is -permitted (e.g. `if dask`, `if numpy`, `if sparse` etc.). Instead xarray can take the more permissive approach of simply -treating the wrapped array as valid, attempting to call the relevant methods (e.g. `.mean()`) and only raising an -error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows -objects and classes from different libraries to work together more easily. - .. note:: For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duckarrays`. From cc4fac03adfa387cf61b4fbcf48936dfab502fa0 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 27 Jun 2023 14:36:03 -0400 Subject: [PATCH 13/63] delete section on numpy ufuncs --- doc/user-guide/duckarrays.rst | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index c3d354cd98d..ba15eda163c 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -138,16 +138,6 @@ We saw above that numpy-like arrays provide numpy methods. Xarray automatically s_da.sum(dim="j") -Numpy ufuncs -~~~~~~~~~~~~ - -Xarray objects support calling numpy functions direction on the xarray objects, e.g. ``np.func(da)``. -This also works when wrapping numpy-like arrays: - -.. ipython:: python - - np.sum(s_da, axis=1) - Converting wrapped types ~~~~~~~~~~~~~~~~~~~~~~~~ From 5e8015f6373a77e0cefea8c90e2592436fd8aaf1 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 27 Jun 2023 14:39:48 -0400 Subject: [PATCH 14/63] explain difference between .values and to_numpy --- doc/user-guide/duckarrays.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index ba15eda163c..0b7a5baa46c 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -150,11 +150,13 @@ If you want to change the type inside your xarray object you can use :py:meth:`D This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array. If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or -:py:meth:`DataArray.values` (what is the difference here?). +:py:meth:`DataArray.values`, where the former is preferred. (The difference is in the way they coerce to numpy - `.values` +always uses `np.asarray` which will fail for some array types (e.g. ``cupy``, whereas `to_numpy` uses the correct method +depending on the array type.) This illustrates the difference between `.values` and `.data`, which is sometimes a point of confusion for new xarray users. -:py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas :py:meth:`DataArray.values` -converts the underlying array to a numpy array before returning it. +Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas +:py:meth:`DataArray.values` converts the underlying array to a numpy array before returning it. Conversion to numpy as a fallback ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 70bfda5d423af093a786dd6d0c7ccd311ced0d81 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 27 Jun 2023 14:40:48 -0400 Subject: [PATCH 15/63] strongly prefer to_numpy over values --- doc/user-guide/duckarrays.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 0b7a5baa46c..cd24a348c57 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -150,13 +150,13 @@ If you want to change the type inside your xarray object you can use :py:meth:`D This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array. If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or -:py:meth:`DataArray.values`, where the former is preferred. (The difference is in the way they coerce to numpy - `.values` +:py:meth:`DataArray.values`, where the former is strongly preferred. (The difference is in the way they coerce to numpy - `.values` always uses `np.asarray` which will fail for some array types (e.g. ``cupy``, whereas `to_numpy` uses the correct method depending on the array type.) This illustrates the difference between `.values` and `.data`, which is sometimes a point of confusion for new xarray users. Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas -:py:meth:`DataArray.values` converts the underlying array to a numpy array before returning it. +:py:meth:`DataArray.to_numpy` converts the underlying array to a numpy array before returning it. Conversion to numpy as a fallback ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 5fdb7e3263d0180d87d59b26278dd04f5570a248 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 27 Jun 2023 14:42:17 -0400 Subject: [PATCH 16/63] recommend to_numpy instead of values in the how do I? page --- doc/howdoi.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/howdoi.rst b/doc/howdoi.rst index b6374cc5100..8cc4e9939f2 100644 --- a/doc/howdoi.rst +++ b/doc/howdoi.rst @@ -42,7 +42,7 @@ How do I ... * - extract the underlying array (e.g. NumPy or Dask arrays) - :py:attr:`DataArray.data` * - convert to and extract the underlying NumPy array - - :py:attr:`DataArray.values` + - :py:attr:`DataArray.to_numpy` * - convert to a pandas DataFrame - :py:attr:`Dataset.to_dataframe` * - sort values From 68315f84d372d4c1c5a716db786ced454d4f560c Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 27 Jun 2023 14:45:16 -0400 Subject: [PATCH 17/63] clearer about using to_numpy --- doc/user-guide/duckarrays.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index cd24a348c57..78be8907cb8 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -154,9 +154,10 @@ If instead you want to convert to numpy and return that numpy array you can use always uses `np.asarray` which will fail for some array types (e.g. ``cupy``, whereas `to_numpy` uses the correct method depending on the array type.) -This illustrates the difference between `.values` and `.data`, which is sometimes a point of confusion for new xarray users. +This illustrates the difference between ``.data`` and ``.values``, which is sometimes a point of confusion for new xarray users. Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas -:py:meth:`DataArray.to_numpy` converts the underlying array to a numpy array before returning it. +:py:meth:`DataArray.values` converts the underlying array to a numpy array before returning it. +(This is another reason to use ``.to_numpy`` over ``.values`` - the intention is clearer.) Conversion to numpy as a fallback ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 2931b86d33b83b3808964e45fedc47d53c98ab58 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 27 Jun 2023 14:47:48 -0400 Subject: [PATCH 18/63] merge section on missing features --- doc/user-guide/duckarrays.rst | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 78be8907cb8..33d52fe66ee 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -166,9 +166,6 @@ If a wrapped array does not implement the corresponding array method then xarray underlying array to a numpy array so that the operation can be performed. You may want to watch out for this behavior, and report any instances in which it causes problems. -Missing features ----------------- - Most of xarray's API does support using :term:`duck array` objects, but there are a few areas where the code will still convert to ``numpy`` arrays: @@ -201,7 +198,7 @@ the code will still convert to ``numpy`` arrays: * :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should - wrap the new ``dask`` array; changing the chunk sizes works. + wrap the new ``dask`` array; changing the chunk sizes works however. Extensions using duck arrays ---------------------------- From 9f21b00598216269d7494607edb053ba24016e27 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 27 Jun 2023 14:51:18 -0400 Subject: [PATCH 19/63] remove todense from examples --- doc/user-guide/duckarrays.rst | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 33d52fe66ee..03fa82cdaf3 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -57,12 +57,7 @@ in a memory-efficient manner. We can create a sparse array object (of the ``spar This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements. This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices). -It does mean that in order to clearly see what is stored in our sparse array object we have to convert it back to a -"dense" array using ``.todense``: - -.. ipython:: python - - s.todense() +Sparse array objects can be converted back to a "dense" numpy array by calling ``.todense``. Just like `numpy.ndarray` objects, `sparse.COO` arrays support indexing @@ -79,19 +74,19 @@ broadcasting, (4, 1), dtype=np.uint8 ) # create second sparse array of different shape s2 = COO.from_numpy(x2) - (s * s2).todense() # multiplication requires broadcasting + (s * s2) # multiplication requires broadcasting and various computation methods .. ipython:: python - s.sum(axis=1).todense() + s.sum(axis=1) This numpy-like array also supports calling so-called numpy ufuncs (link to numpy docs) on it directly: .. ipython:: python - np.sum(s, axis=1).todense() + np.sum(s, axis=1) Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the From 2bb65d584bc7fbd709056089bc1428de1253e8d7 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 27 Jun 2023 14:53:04 -0400 Subject: [PATCH 20/63] whatsnew --- doc/whats-new.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 5c0d3c3c843..21f381e1901 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -50,6 +50,8 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Expanded the page on wrapping numpy-like "duck" arrays. + (:pull:`7911`) By `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ From 0b405a14cd4bd09a3fc7f7dc6bcd5a1a39fdd84e Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:13:53 -0400 Subject: [PATCH 21/63] double that Co-authored-by: Deepak Cherian --- doc/internals/duck-arrays-integration.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index f7c4e9a2c04..95f31bc3922 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -14,7 +14,7 @@ This page is intended for developers who are interested in wrapping a new custom Duck array requirements ~~~~~~~~~~~~~~~~~~~~~~~ -Xarray does not explicitly check that that required methods are defined by the underlying duck array object before +Xarray does not explicitly check that required methods are defined by the underlying duck array object before attempting to wrap the given array. However, a wrapped array type should at a minimum define these attributes: * ``shape`` property, From ed6195c4a87307e8a4ebd14fa316bc62a2255cc8 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:14:17 -0400 Subject: [PATCH 22/63] numpy array class clarification Co-authored-by: Deepak Cherian --- doc/internals/duck-arrays-integration.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index 95f31bc3922..2b7ca0644b7 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -24,7 +24,7 @@ attempting to wrap the given array. However, a wrapped array type should at a mi * ``__array_ufunc__`` method, * ``__array_function__`` method. -These need to be defined consistently with numpy :py:class:`numpy.ndarray`, for example the array ``shape`` +These need to be defined consistently with :py:class:`numpy.ndarray`, for example the array ``shape`` property needs to obey `numpy's broadcasting rules `_ (see also the `Python Array API standard's explanation `_ of these same rules). From 40eb53b95adb67710e37a060f4e562a19bd3e8ac Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:14:45 -0400 Subject: [PATCH 23/63] Remove sentence about xarray's internals Co-authored-by: Deepak Cherian --- doc/internals/duck-arrays-integration.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index 2b7ca0644b7..9a34090a870 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -33,8 +33,7 @@ Python Array API standard support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As an integration library xarray benefits greatly from the standardization of duck-array libraries' APIs, and so is a -big supporter of the `Python Array API Standard `_. In fact the crystallization of different array libraries' APIs towards -the standard has already helped xarray remove a lot of internal adapter code. +big supporter of the `Python Array API Standard `_. . We aim to support any array libraries that follow the standard out-of-the-box. However, xarray does occasionally call some numpy functions which are not (yet) part of the standard (e.g. :py:meth:`xarray.DataArray.pad` calls :py:func:`numpy.pad`). From a567aa4bb8bb04192c5ee6f0d73f400d11c3751f Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:14:57 -0400 Subject: [PATCH 24/63] array API standard Co-authored-by: Deepak Cherian --- doc/internals/duck-arrays-integration.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index 9a34090a870..3b6313dbf2f 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -35,7 +35,7 @@ Python Array API standard support As an integration library xarray benefits greatly from the standardization of duck-array libraries' APIs, and so is a big supporter of the `Python Array API Standard `_. . -We aim to support any array libraries that follow the standard out-of-the-box. However, xarray does occasionally +We aim to support any array libraries that follow the Array API standard out-of-the-box. However, xarray does occasionally call some numpy functions which are not (yet) part of the standard (e.g. :py:meth:`xarray.DataArray.pad` calls :py:func:`numpy.pad`). See `xarray issue #7848 `_ for a list of such functions. We can still support dispatching on these functions through the array protocols above, it just means that if you exclusively implement the methods in the Python Array API standard From 76237a9f52f9c71e977e1f4ffdcd13e097bc19be Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:15:43 -0400 Subject: [PATCH 25/63] proper link for sparse.COO type Co-authored-by: Deepak Cherian --- doc/user-guide/duckarrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 03fa82cdaf3..73c2cdc57d9 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -45,7 +45,7 @@ A "numpy-like array" (also known as a "duck array") is a class that contains arr numpy-like functionality such as indexing, broadcasting, and computation methods. For example, the `sparse `_ library provides a sparse array type which is useful for representing nD array objects like sparse matrices -in a memory-efficient manner. We can create a sparse array object (of the ``sparse.COO`` type) from a numpy array like this: +in a memory-efficient manner. We can create a sparse array object (of the :py:class:`sparse.COO` type) from a numpy array like this: .. ipython:: python From 1923d4b20e97cb0f173e821cf0edc459f0602916 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:16:19 -0400 Subject: [PATCH 26/63] links to docstrings of array types Co-authored-by: Deepak Cherian --- doc/user-guide/duckarrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 73c2cdc57d9..734633f36c1 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -59,7 +59,7 @@ This sparse object does not attempt to explicitly store every element in the arr This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices). Sparse array objects can be converted back to a "dense" numpy array by calling ``.todense``. -Just like `numpy.ndarray` objects, `sparse.COO` arrays support indexing +Just like :py:class:`numpy.ndarray` objects, :py:class:`sparse.COO` arrays support indexing .. ipython:: python From b26cbd83ceb3748c06c392933f03e23bf59eb6b8 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:16:41 -0400 Subject: [PATCH 27/63] don't put variable in parentheses Co-authored-by: Deepak Cherian --- doc/user-guide/duckarrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 734633f36c1..288c4e897ef 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -99,7 +99,7 @@ equivalent numpy array - this is the sense in which the sparse array is "numpy-l Wrapping numpy-like arrays in xarray ------------------------------------ -:py:class:`DataArray` and :py:class:`Dataset` (and :py:class:`Variable`) objects can wrap these numpy-like arrays. +:py:class:`DataArray`, :py:class:`Dataset`, and :py:class:`Variable` objects can wrap these numpy-like arrays. Constructing xarray objects which wrap numpy-like arrays ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From f62b4a9ac60e998cd3f8f7862768c05d6943bd7f Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:17:18 -0400 Subject: [PATCH 28/63] double backquote formatting Co-authored-by: Deepak Cherian --- doc/user-guide/duckarrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 288c4e897ef..261bc91dc0e 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -118,7 +118,7 @@ For example, we can wrap the sparse array we created earlier inside a new DataAr We can see what's inside - the printable representation of our xarray object (the repr) automatically uses the printable representation of the underlying wrapped array. -Of course our sparse array object is still there underneath - it's stored under the `.data` attribute of the dataarray: +Of course our sparse array object is still there underneath - it's stored under the ``.data`` attribute of the dataarray: .. ipython:: python From 8d4bd3fd5ecfb584d69b77f43dc8ca0f3505d132 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:17:38 -0400 Subject: [PATCH 29/63] better bracketing Co-authored-by: Deepak Cherian --- doc/user-guide/duckarrays.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 261bc91dc0e..7c470750bb4 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -145,9 +145,9 @@ If you want to change the type inside your xarray object you can use :py:meth:`D This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array. If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or -:py:meth:`DataArray.values`, where the former is strongly preferred. (The difference is in the way they coerce to numpy - `.values` -always uses `np.asarray` which will fail for some array types (e.g. ``cupy``, whereas `to_numpy` uses the correct method -depending on the array type.) +:py:meth:`DataArray.values`, where the former is strongly preferred. The difference is in the way they coerce to numpy - `.values` +always uses `np.asarray` which will fail for some array types (e.g. ``cupy``), whereas `to_numpy` uses the correct method +depending on the array type. This illustrates the difference between ``.data`` and ``.values``, which is sometimes a point of confusion for new xarray users. Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas From e9287dec19dd12c90cb81ca72c3c31ad16e7d5d6 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 17:24:04 -0400 Subject: [PATCH 30/63] fix list formatting --- doc/internals/index.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/internals/index.rst b/doc/internals/index.rst index 25671877297..132f6c40ede 100644 --- a/doc/internals/index.rst +++ b/doc/internals/index.rst @@ -9,9 +9,10 @@ extensions), which makes it easy to develop and extend. Instead, we push compiled code to :ref:`optional dependencies`. The pages in this section are intended for: -- Contributors to xarray who wish to better understand some of the internals, -- Developers who wish to extend xarray with domain-specific logic, perhaps to support a new scientific community of users, -- Developers who wish to interface xarray with their existing tooling, e.g. by creating a plugin for reading a new file format, or wrapping a custom array type. + +* Contributors to xarray who wish to better understand some of the internals, +* Developers who wish to extend xarray with domain-specific logic, perhaps to support a new scientific community of users, +* Developers who wish to interface xarray with their existing tooling, e.g. by creating a plugin for reading a new file format, or wrapping a custom array type. .. toctree:: From d1e9b8fb0873c0ceefcbef3c6da10a777f7ae0fa Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 17:46:32 -0400 Subject: [PATCH 31/63] add links to glue packages, dask, and cubed --- doc/user-guide/duckarrays.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 03fa82cdaf3..587e0e2b88c 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -12,9 +12,11 @@ additional features of these array libraries. Some numpy-like array types that xarray already has some support for: -* `Cupy `_ - GPU support, +* `Cupy `_ - GPU support (see `cupy-xarray `_), * `Sparse `_ - for performant arrays with many zero elements, -* `Pint `_ - for tracking the physical units of your data. +* `Pint `_ - for tracking the physical units of your data (see `pint-xarray `_), +* `Dask `_ - parallel computing on larger-than-memory arrays (see :ref:`using dask with xarray `), +* `Cubed `_ - another parallel computing framework that emphasises reliability (see `cubed-xarray `_). .. warning:: From 1ea207827fe2b6c78126377992c5d50afd0cf45a Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 28 Jun 2023 17:47:56 -0400 Subject: [PATCH 32/63] link to todense method Co-authored-by: Deepak Cherian --- doc/user-guide/duckarrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 7be7e96d3eb..3b2d94170a3 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -59,7 +59,7 @@ in a memory-efficient manner. We can create a sparse array object (of the :py:cl This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements. This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices). -Sparse array objects can be converted back to a "dense" numpy array by calling ``.todense``. +Sparse array objects can be converted back to a "dense" numpy array by calling :py:meth:`sparse.COO.todense`. Just like :py:class:`numpy.ndarray` objects, :py:class:`sparse.COO` arrays support indexing From be919b684354e564b62b985612e676c9dc3bdbe8 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 17:51:24 -0400 Subject: [PATCH 33/63] link to numpy-like arrays page --- doc/user-guide/data-structures.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index e0fd4bd0d25..64e7b3625ac 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -19,7 +19,8 @@ DataArray :py:class:`xarray.DataArray` is xarray's implementation of a labeled, multi-dimensional array. It has several key properties: -- ``values``: a :py:class:`numpy.ndarray` holding the array's values +- ``values``: a :py:class:`numpy.ndarray` or + :ref:`numpy-like array ` holding the array's values - ``dims``: dimension names for each axis (e.g., ``('x', 'y', 'z')``) - ``coords``: a dict-like container of arrays (*coordinates*) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or @@ -46,7 +47,8 @@ Creating a DataArray The :py:class:`~xarray.DataArray` constructor takes: - ``data``: a multi-dimensional array of values (e.g., a numpy ndarray, - :py:class:`~pandas.Series`, :py:class:`~pandas.DataFrame` or ``pandas.Panel``) + a :ref:`numpy-like array `, :py:class:`~pandas.Series`, + :py:class:`~pandas.DataFrame` or ``pandas.Panel``) - ``coords``: a list or dictionary of coordinates. If a list, it should be a list of tuples where the first element is the dimension name and the second element is the corresponding coordinate array_like object. From d03e12597c9a3490d0b6d5ee9093d88eab1e692d Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 17:52:44 -0400 Subject: [PATCH 34/63] link to numpy ufunc docs --- doc/user-guide/duckarrays.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 3b2d94170a3..0ec1b6f80cd 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -84,7 +84,8 @@ and various computation methods s.sum(axis=1) -This numpy-like array also supports calling so-called numpy ufuncs (link to numpy docs) on it directly: +This numpy-like array also supports calling so-called `numpy ufuncs `_ +("universal functions") on it directly: .. ipython:: python From c50f5317790c8ff0ebdc13a33ecf1472f9d223b7 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 18:39:02 -0400 Subject: [PATCH 35/63] more text about chunkmanagers --- doc/internals/chunked-arrays.rst | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 9005a0135ea..2f629a14241 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -20,22 +20,33 @@ implements the handling of processing all of the chunks. Chunked array requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~ -A chunked array Needs to meet all the :ref:`requirements for normal duck arrays `, but should also +A chunked array needs to meet all the :ref:`requirements for normal duck arrays `, but should also implement these methods: - ``.chunk`` - ``.rechunk`` - ``.compute`` - Chunked operations as function primitives ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Actual full list is defined in the :py:class:``~xarray.core.parallelcompat.ChunkManagerEntryPoint`` class (link to that API documentation) +Xarray dispatches chunk-aware computations across arrays using function "primitives" that accept one or more arrays. +Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``. +These primitives are generalizations of functions first implemented in :py:class:`dask.array`. +The implementation of these functions is specific to the type of arrays passed to them: :py:class:`dask.array.Array` objects +must be processed by :py:func:`dask.array.map_blocks`, whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. +In order to use the correct function primitive for the array type encountered, xarray dispatches to the corresponding subclass of :py:class:``~xarray.core.parallelcompat.ChunkManagerEntryPoint``, +also known as a "Chunk Manager". Therefore a full list of the primitive functions that need to be defined is set by the API of the +:py:class:``~xarray.core.parallelcompat.ChunkManagerEntryPoint`` abstract base class. ChunkManagerEntrypoint subclassing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an entrypoint +system to allow developers of new chunked array implementations to register a subclass of +:py:class:``~xarray.core.parallelcompat.ChunkManagerEntryPoint`` + + .. autosummary:: :toctree: generated/ From 90a8bcb57da9441c8c96d4df0b2d74f6f89f58b1 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 18:39:24 -0400 Subject: [PATCH 36/63] add example of using .to_numpy --- doc/user-guide/duckarrays.rst | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 0ec1b6f80cd..90ee8ee63aa 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -148,14 +148,21 @@ If you want to change the type inside your xarray object you can use :py:meth:`D This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array. If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or -:py:meth:`DataArray.values`, where the former is strongly preferred. The difference is in the way they coerce to numpy - `.values` -always uses `np.asarray` which will fail for some array types (e.g. ``cupy``), whereas `to_numpy` uses the correct method -depending on the array type. +:py:meth:`DataArray.values`, where the former is strongly preferred. The difference is in the way they coerce to numpy - :py:meth:`~DataArray.values` +always uses :py:func:`numpy.asarray` which will fail for some array types (e.g. ``cupy``), whereas :py:meth:`~DataArray.to_numpy` +uses the correct method depending on the array type. -This illustrates the difference between ``.data`` and ``.values``, which is sometimes a point of confusion for new xarray users. +.. ipython:: python + + s_da.to_numpy() + + s_da.values + +This illustrates the difference between :py:meth:`~DataArray.data` and :py:meth:`~DataArray.values`, +which is sometimes a point of confusion for new xarray users. Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas :py:meth:`DataArray.values` converts the underlying array to a numpy array before returning it. -(This is another reason to use ``.to_numpy`` over ``.values`` - the intention is clearer.) +(This is another reason to use :py:meth:`~DataArray.to_numpy` over :py:meth:`~DataArray.values` - the intention is clearer.) Conversion to numpy as a fallback ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 9a086c512a72f05b7f28de68b47127ba7cc251e6 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 18:43:38 -0400 Subject: [PATCH 37/63] note on ideally not having an entrypoint system --- doc/internals/chunked-arrays.rst | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 2f629a14241..b274dea6664 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -37,15 +37,20 @@ must be processed by :py:func:`dask.array.map_blocks`, whereas :py:class:`cubed. In order to use the correct function primitive for the array type encountered, xarray dispatches to the corresponding subclass of :py:class:``~xarray.core.parallelcompat.ChunkManagerEntryPoint``, also known as a "Chunk Manager". Therefore a full list of the primitive functions that need to be defined is set by the API of the -:py:class:``~xarray.core.parallelcompat.ChunkManagerEntryPoint`` abstract base class. +:py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint`` abstract base class. + +:: note: + + The :py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint`` abstract base class is mostly just acting as a + namespace for containing the chunked-aware function primitives. Ideally in the future we would have an API standard + for chunked array types which codified this structure, making the entrypoint system unnecessary. ChunkManagerEntrypoint subclassing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an entrypoint -system to allow developers of new chunked array implementations to register a subclass of -:py:class:``~xarray.core.parallelcompat.ChunkManagerEntryPoint`` - +Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an +entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of +:py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint``. .. autosummary:: :toctree: generated/ From cfd1396851e3037a5bb836ce9dc503921bf7bd41 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 18:46:02 -0400 Subject: [PATCH 38/63] parallel processing without chunks --- doc/internals/chunked-arrays.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index b274dea6664..3c146e6dc92 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -69,4 +69,6 @@ User interface Parallel processing without chunks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Don't necessarily need all this +To use a parallel array type that does not expose a concept of chunks explicitly, none of the information on this page +is theoretically required. Such an array type could be wrapped using xarray's existing +support for `numpy-like "duck" arrays `. From 9dc63c045d4637ada35089e4245360a3c8ab3a62 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 18:52:39 -0400 Subject: [PATCH 39/63] explain the user interface --- doc/internals/chunked-arrays.rst | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 3c146e6dc92..7717c632737 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -62,9 +62,17 @@ entrypoint system to allow developers of new chunked array implementations to re User interface ~~~~~~~~~~~~~~ -``chunked_array_type`` kwarg -``from_array_kwargs`` dict +Once the chunkmanager subclass has been registered, xarray objects wrapping the desired array type can be created in 3 ways: +#. By manually passing the array type to the :py:class:`~DataArray` constructor, see the examples for `numpy-like arrays `, + +#. Calling :py:meth:`DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``, + +#. Calling :py:func:`open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``. + +The latter two methods ultimately call the chunkmanager's implementation of ``.from_array``, to which they pass the ``from_array_kwargs`` dict. +The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to `'dask'` if found, +otherwise to whichever chunkmanager is registered if only one is registered, else it will raise an error. Parallel processing without chunks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 7bdd9761ebf63edad8c312c30ee2368740fe967d Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 18:55:46 -0400 Subject: [PATCH 40/63] how to register the chunkmanager --- doc/internals/chunked-arrays.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 7717c632737..c1e729998c4 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -52,12 +52,21 @@ Rather than hard-coding various chunk managers to deal with specific chunked arr entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of :py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint``. +The key internal API is: + .. autosummary:: :toctree: generated/ xarray.core.parallelcompat.list_chunkmanagers xarray.core.parallelcompat.ChunkManagerEntrypoint +To register a new entrypoint you need to add an entry to the ``setup.cfg`` like this:: + + [options.entry_points] + xarray.chunkmanagers = + dask = xarray.core.daskmanager:DaskManager + +See also `cubed-xarray `_ for another example. User interface ~~~~~~~~~~~~~~ From 14057b95ff7adb6faf5a6bd14e9000873c74efdb Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 18:57:25 -0400 Subject: [PATCH 41/63] show example of .values failing --- doc/user-guide/duckarrays.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index 90ee8ee63aa..dc1d2d1cb8a 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -156,6 +156,9 @@ uses the correct method depending on the array type. s_da.to_numpy() +.. ipython:: python + :okexcept: + s_da.values This illustrates the difference between :py:meth:`~DataArray.data` and :py:meth:`~DataArray.values`, From 098e152e61f0ec0361695ed08a619376778ca119 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 18:59:25 -0400 Subject: [PATCH 42/63] link from duck arrays page --- doc/user-guide/duckarrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst index dc1d2d1cb8a..f0650ac61b5 100644 --- a/doc/user-guide/duckarrays.rst +++ b/doc/user-guide/duckarrays.rst @@ -27,7 +27,7 @@ Some numpy-like array types that xarray already has some support for: For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require - slightly different user code (e.g. calling ``.chunk`` or ``.compute``). + slightly different user code (e.g. calling ``.chunk`` or ``.compute``). See the docs on :ref:`wrapping chunked arrays `. Why "duck"? ----------- From c7b686a4a3cea06d9e10b006a91764b317723019 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 19:02:07 -0400 Subject: [PATCH 43/63] whatsnew --- doc/whats-new.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 3c206e298bc..fc07b3fe4e6 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -38,6 +38,8 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Added page on wrapping chunked numpy-like arrays as alternatives to dask arrays. + (:pull:`7951`) By `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ From 45000e426eafd53c5a2d541089bfba253a1f1c03 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 19:02:51 -0400 Subject: [PATCH 44/63] move whatsnew entry to unreleased version --- doc/whats-new.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 3c206e298bc..9e0ae04f775 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -38,6 +38,8 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Expanded the page on wrapping numpy-like "duck" arrays. + (:pull:`7911`) By `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ @@ -96,10 +98,7 @@ Bug fixes By `Juniper Tyree `_. Documentation -~~~~~~~~~~~~~ - -- Expanded the page on wrapping numpy-like "duck" arrays. - (:pull:`7911`) By `Tom Nicholas `_. +~~~~~~~~~~~~ Internal Changes ~~~~~~~~~~~~~~~~ From 80e9fa4812c67a58f1b31fe31660461dd237196b Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 19:13:18 -0400 Subject: [PATCH 45/63] capitalization --- doc/internals/chunked-arrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index c1e729998c4..cc2e2bb161a 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -35,7 +35,7 @@ These primitives are generalizations of functions first implemented in :py:class The implementation of these functions is specific to the type of arrays passed to them: :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`, whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. -In order to use the correct function primitive for the array type encountered, xarray dispatches to the corresponding subclass of :py:class:``~xarray.core.parallelcompat.ChunkManagerEntryPoint``, +In order to use the correct function primitive for the array type encountered, xarray dispatches to the corresponding subclass of :py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint``, also known as a "Chunk Manager". Therefore a full list of the primitive functions that need to be defined is set by the API of the :py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint`` abstract base class. From da8719d6add5c5de68cb5d098b6c4d8df6c4d170 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 28 Jun 2023 19:28:42 -0400 Subject: [PATCH 46/63] fix warning in docs build --- doc/whats-new.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 9e0ae04f775..ce2c0a698ac 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -98,7 +98,7 @@ Bug fixes By `Juniper Tyree `_. Documentation -~~~~~~~~~~~~ +~~~~~~~~~~~~~ Internal Changes ~~~~~~~~~~~~~~~~ From 6c1a4224cfb39b921e63b5414cf1c9ff903765be Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 30 Jun 2023 20:16:14 -0400 Subject: [PATCH 47/63] fix a bunch of links --- doc/internals/chunked-arrays.rst | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index cc2e2bb161a..408c245ce7d 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -1,3 +1,4 @@ +.. currentmodule:: xarray .. _internals.chunkedarrays: @@ -12,11 +13,10 @@ Alternative chunked array types Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface. This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands. -The basic idea is that by wrapping an array that has an explicit notion of ``chunks``, xarray can expose control over +The basic idea is that by wrapping an array that has an explicit notion of ``.chunks``, xarray can expose control over the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually implements the handling of processing all of the chunks. - Chunked array requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -26,22 +26,23 @@ A chunked array needs to meet all the :ref:`requirements for normal duck arrays - ``.rechunk`` - ``.compute`` -Chunked operations as function primitives -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Chunked operations as "core operations" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Xarray dispatches chunk-aware computations across arrays using function "primitives" that accept one or more arrays. +Xarray dispatches chunk-aware computations across arrays using "core operations" that accept one or more arrays. Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``. -These primitives are generalizations of functions first implemented in :py:class:`dask.array`. +These core operations are generalizations of functions first implemented in :py:class:`dask.array`. The implementation of these functions is specific to the type of arrays passed to them: :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`, whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. -In order to use the correct function primitive for the array type encountered, xarray dispatches to the corresponding subclass of :py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint``, +In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the +corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`, also known as a "Chunk Manager". Therefore a full list of the primitive functions that need to be defined is set by the API of the -:py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint`` abstract base class. +:py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class. -:: note: +.. note:: - The :py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint`` abstract base class is mostly just acting as a + The :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class is mostly just acting as a namespace for containing the chunked-aware function primitives. Ideally in the future we would have an API standard for chunked array types which codified this structure, making the entrypoint system unnecessary. @@ -50,12 +51,11 @@ ChunkManagerEntrypoint subclassing Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of -:py:class:``~xarray.core.parallelcompat.ChunkManagerEntrypoint``. +:py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`. The key internal API is: .. autosummary:: - :toctree: generated/ xarray.core.parallelcompat.list_chunkmanagers xarray.core.parallelcompat.ChunkManagerEntrypoint @@ -73,14 +73,14 @@ User interface Once the chunkmanager subclass has been registered, xarray objects wrapping the desired array type can be created in 3 ways: -#. By manually passing the array type to the :py:class:`~DataArray` constructor, see the examples for `numpy-like arrays `, +#. By manually passing the array type to the :py:class:`DataArray` constructor, see the examples for :ref:`numpy-like arrays `, #. Calling :py:meth:`DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``, #. Calling :py:func:`open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``. The latter two methods ultimately call the chunkmanager's implementation of ``.from_array``, to which they pass the ``from_array_kwargs`` dict. -The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to `'dask'` if found, +The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to ``'dask'`` if found, otherwise to whichever chunkmanager is registered if only one is registered, else it will raise an error. Parallel processing without chunks @@ -88,4 +88,4 @@ Parallel processing without chunks To use a parallel array type that does not expose a concept of chunks explicitly, none of the information on this page is theoretically required. Such an array type could be wrapped using xarray's existing -support for `numpy-like "duck" arrays `. +support for :ref:`numpy-like "duck" arrays `. From 2a4076302df6819de60d46ddf4213f59b1747c23 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 4 Jul 2023 01:19:49 -0400 Subject: [PATCH 48/63] display API of ChunkManagerEntrypoint class attributes and methods --- doc/internals/chunked-arrays.rst | 51 +++++++++++++++++--------------- 1 file changed, 27 insertions(+), 24 deletions(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 408c245ce7d..74188921359 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -17,28 +17,26 @@ The basic idea is that by wrapping an array that has an explicit notion of ``.ch the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually implements the handling of processing all of the chunks. -Chunked array requirements -~~~~~~~~~~~~~~~~~~~~~~~~~~ +Chunked array methods and "core operations" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A chunked array needs to meet all the :ref:`requirements for normal duck arrays `, but should also implement these methods: +A chunked array needs to meet all the :ref:`requirements for normal duck arrays `, but must also +implement additional features. -- ``.chunk`` -- ``.rechunk`` -- ``.compute`` +Chunked arrays have additional attributes and methods, such as ``.chunks`` and ``.rechunk``. +Furthermore, Xarray dispatches chunk-aware computations across one or more chunked arrays using special functions known +as "core operations". Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``. -Chunked operations as "core operations" -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Xarray dispatches chunk-aware computations across arrays using "core operations" that accept one or more arrays. -Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``. -These core operations are generalizations of functions first implemented in :py:class:`dask.array`. -The implementation of these functions is specific to the type of arrays passed to them: :py:class:`dask.array.Array` objects -must be processed by :py:func:`dask.array.map_blocks`, whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. +The core operations are generalizations of functions first implemented in :py:class:`dask.array`. +The implementation of these functions is specific to the type of arrays passed to them. For example, when applying the +``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`, +whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`, -also known as a "Chunk Manager". Therefore a full list of the primitive functions that need to be defined is set by the API of the -:py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class. +also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the +API of the :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class**. Note that chunked array +methods are also currently dispatched using this class. .. note:: @@ -46,19 +44,18 @@ also known as a "Chunk Manager". Therefore a full list of the primitive function namespace for containing the chunked-aware function primitives. Ideally in the future we would have an API standard for chunked array types which codified this structure, making the entrypoint system unnecessary. -ChunkManagerEntrypoint subclassing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. currentmodule:: xarray.core.parallelcompat + +.. autoclass:: ChunkManagerEntrypoint + :members: + +Registering a new ChunkManagerEntrypoint subclass +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`. -The key internal API is: - -.. autosummary:: - - xarray.core.parallelcompat.list_chunkmanagers - xarray.core.parallelcompat.ChunkManagerEntrypoint To register a new entrypoint you need to add an entry to the ``setup.cfg`` like this:: @@ -68,6 +65,12 @@ To register a new entrypoint you need to add an entry to the ``setup.cfg`` like See also `cubed-xarray `_ for another example. +To check that the entrypoint has worked correctly, you may find it useful to display the available chunkmanagers using +the internal function :py:func:`~xarray.core.parallelcompat.list_chunkmanagers`. + +.. autofunction:: list_chunkmanagers + + User interface ~~~~~~~~~~~~~~ From 15505fa56ed670af3710b825fc228709a2d0c461 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 4 Jul 2023 01:20:16 -0400 Subject: [PATCH 49/63] improve docstrings in ABC --- xarray/core/parallelcompat.py | 126 ++++++++++++++++++++++++++++++---- 1 file changed, 113 insertions(+), 13 deletions(-) diff --git a/xarray/core/parallelcompat.py b/xarray/core/parallelcompat.py index 4df4ff235c6..8e07da55619 100644 --- a/xarray/core/parallelcompat.py +++ b/xarray/core/parallelcompat.py @@ -31,7 +31,13 @@ @functools.lru_cache(maxsize=1) def list_chunkmanagers() -> dict[str, ChunkManagerEntrypoint]: """ - Return a dictionary of available chunk managers and their ChunkManagerEntrypoint objects. + Return a dictionary of available chunk managers and their ChunkManagerEntrypoint subclass objects. + + Returns + ------- + chunnkmanagers : dict + Dictionary whose values are registered ChunkManagerEntrypoint subclass instances, and whose values + are the strings under which they are registered. Notes ----- @@ -143,7 +149,13 @@ def get_chunked_array_type(*args) -> ChunkManagerEntrypoint: class ChunkManagerEntrypoint(ABC, Generic[T_ChunkedArray]): """ - Adapter between a particular parallel computing framework and xarray. + Interface between a particular parallel computing framework and xarray. + + This abstract base class must be subclassed by libraries implementing chunked array types, and + registered via the ``chunkmanagers`` entrypoint. + + Abstract methods on this class must be implemented, whereas non-abstract methods are only required in order to + enable a subset of xarray functionality, and by default will raise a ``NotImplementedError`` if called. Attributes ---------- @@ -151,7 +163,7 @@ class ChunkManagerEntrypoint(ABC, Generic[T_ChunkedArray]): Type of the array class this parallel computing framework provides. Parallel frameworks need to provide an array class that supports the array API standard. - Used for type checking. + This attribute is used for array instance type checking at runtime. """ array_cls: type[T_ChunkedArray] @@ -159,13 +171,28 @@ class ChunkManagerEntrypoint(ABC, Generic[T_ChunkedArray]): @abstractmethod def __init__(self) -> None: + """Used to set the array_cls attribute at import time.""" raise NotImplementedError() def is_chunked_array(self, data: Any) -> bool: + """ + Check if the given object is an instance of this type of chunked array. + + Compares against the type stored in the array_cls attribute by default. + """ return isinstance(data, self.array_cls) @abstractmethod def chunks(self, data: T_ChunkedArray) -> T_NormalizedChunks: + """ + Return the current chunks of the given array. + + Used internally by xarray objects' .chunks and .chunksizes properties. + + See Also + -------- + dask.array.Array.chunks + """ raise NotImplementedError() @abstractmethod @@ -177,14 +204,30 @@ def normalize_chunks( dtype: np.dtype | None = None, previous_chunks: T_NormalizedChunks | None = None, ) -> T_NormalizedChunks: - """Called by open_dataset""" + """ + Called internally by xarray.open_dataset. + + See Also + -------- + dask.array.normalize_chunks + """ raise NotImplementedError() @abstractmethod def from_array( self, data: np.ndarray, chunks: T_Chunks, **kwargs ) -> T_ChunkedArray: - """Called when .chunk is called on an xarray object that is not already chunked.""" + """ + Creates a chunked array from a non-chunked numpy-like array. + + Called when the .chunk method is called on an xarray object that is not already chunked. + Also called within open_dataset (when chunks is not None) to create a chunked array from + an xarray lazily indexed array. + + See Also + -------- + dask.Array.array.from_array + """ raise NotImplementedError() def rechunk( @@ -193,17 +236,40 @@ def rechunk( chunks: T_NormalizedChunks | tuple[int, ...] | T_Chunks, **kwargs, ) -> T_ChunkedArray: - """Called when .chunk is called on an xarray object that is already chunked.""" + """ + Changes the chunking pattern of the given array. + + Called when the .chunk method is called on an xarray object that is already chunked. + + See Also + -------- + dask.array.Array.rechunk + """ return data.rechunk(chunks, **kwargs) # type: ignore[attr-defined] @abstractmethod def compute(self, *data: T_ChunkedArray, **kwargs) -> tuple[np.ndarray, ...]: - """Used anytime something needs to computed, including multiple arrays at once.""" + """ + Computes one or more chunked arrays, returning them as eager numpy arrays. + + Called anytime something needs to computed, including multiple arrays at once. + Used by `.compute`, `.persist`, `.values`. + + See Also + -------- + dask.array.compute + """ raise NotImplementedError() @property def array_api(self) -> Any: - """Return the array_api namespace following the python array API standard.""" + """ + Return the array_api namespace following the python array API standard. + + See Also + -------- + dask.array + """ raise NotImplementedError() def reduction( @@ -216,7 +282,13 @@ def reduction( dtype: np.dtype | None = None, keepdims: bool = False, ) -> T_ChunkedArray: - """Used in some reductions like nanfirst, which is used by groupby.first""" + """ + Used in some reductions like nanfirst, which is used by groupby.first. + + See Also + -------- + dask.array.reduction + """ raise NotImplementedError() @abstractmethod @@ -233,6 +305,10 @@ def apply_gufunc( ): """ Called inside xarray.apply_ufunc, so must be supplied for vast majority of xarray computations to be supported. + + See Also + -------- + dask.array.apply_gufunc """ raise NotImplementedError() @@ -246,7 +322,13 @@ def map_blocks( new_axis: int | Sequence[int] | None = None, **kwargs, ): - """Called in elementwise operations, but notably not called in xarray.map_blocks.""" + """ + Called in elementwise operations, but notably not called in xarray.map_blocks. + + See Also + -------- + dask.array.map_blocks + """ raise NotImplementedError() def blockwise( @@ -259,7 +341,13 @@ def blockwise( align_arrays: bool = True, **kwargs, ): - """Called by some niche functions in xarray.""" + """ + Called by some niche functions in xarray. + + See Also + -------- + dask.array.blockwise + """ raise NotImplementedError() def unify_chunks( @@ -267,7 +355,13 @@ def unify_chunks( *args: Any, # can't type this as mypy assumes args are all same type, but dask unify_chunks args alternate types **kwargs, ) -> tuple[dict[str, T_NormalizedChunks], list[T_ChunkedArray]]: - """Called by xr.unify_chunks.""" + """ + Called by xarray.unify_chunks. + + See Also + -------- + dask.array.unify_chunks + """ raise NotImplementedError() def store( @@ -276,5 +370,11 @@ def store( targets: Any, **kwargs: dict[str, Any], ): - """Used when writing to any backend.""" + """ + Used when writing to any backend. + + See Also + -------- + dask.array.store + """ raise NotImplementedError() From 1bf55d373969c13cf0bb1dc2229714ff720f0c6c Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 4 Jul 2023 18:25:08 -0400 Subject: [PATCH 50/63] add cubed to intersphinx mapping --- doc/conf.py | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/conf.py b/doc/conf.py index f201af859b9..6c6efb47f6b 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -323,6 +323,7 @@ "dask": ("https://docs.dask.org/en/latest", None), "cftime": ("https://unidata.github.io/cftime", None), "sparse": ("https://sparse.pydata.org/en/latest/", None), + "cubed": ("https://tom-e-white.com/cubed/", None), } From 04a7a8eaad938f422f3700d08d93b8d719ffd947 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 4 Jul 2023 18:27:58 -0400 Subject: [PATCH 51/63] link to dask.array as module not class --- doc/internals/chunked-arrays.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 74188921359..7605ea620f6 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -27,12 +27,12 @@ Chunked arrays have additional attributes and methods, such as ``.chunks`` and ` Furthermore, Xarray dispatches chunk-aware computations across one or more chunked arrays using special functions known as "core operations". Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``. -The core operations are generalizations of functions first implemented in :py:class:`dask.array`. +The core operations are generalizations of functions first implemented in :py:module:`dask.array`. The implementation of these functions is specific to the type of arrays passed to them. For example, when applying the ``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`, whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. -In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the +In order to use the correct implementation of a coclassre operation for the array type encountered, xarray dispatches to the corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`, also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the API of the :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class**. Note that chunked array From 8b74e893f439c296c8d5c48a625f474df3b7febc Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 09:37:04 -0400 Subject: [PATCH 52/63] typo --- doc/internals/chunked-arrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 7605ea620f6..006443221a7 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -32,7 +32,7 @@ The implementation of these functions is specific to the type of arrays passed t ``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`, whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. -In order to use the correct implementation of a coclassre operation for the array type encountered, xarray dispatches to the +In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`, also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the API of the :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class**. Note that chunked array From cab76bff52dbea56fb56badf7f4cdc6ebe44fe6f Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 11:06:00 -0400 Subject: [PATCH 53/63] fix bold formatting --- doc/internals/chunked-arrays.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 006443221a7..0ae90da992f 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -35,7 +35,7 @@ whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`, also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the -API of the :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class**. Note that chunked array +API of the** :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` **abstract base class**. Note that chunked array methods are also currently dispatched using this class. .. note:: @@ -46,7 +46,7 @@ methods are also currently dispatched using this class. .. currentmodule:: xarray.core.parallelcompat -.. autoclass:: ChunkManagerEntrypoint +.. autoclass:: xarray.core.parallelcompat.ChunkManagerEntrypoint :members: Registering a new ChunkManagerEntrypoint subclass From d120fab9182169e4bcb0918e0f516c36c8760df2 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 11:07:09 -0400 Subject: [PATCH 54/63] proper docstrings --- xarray/core/parallelcompat.py | 268 +++++++++++++++++++++++++++++++++- 1 file changed, 260 insertions(+), 8 deletions(-) diff --git a/xarray/core/parallelcompat.py b/xarray/core/parallelcompat.py index 8e07da55619..e2dc2db63f0 100644 --- a/xarray/core/parallelcompat.py +++ b/xarray/core/parallelcompat.py @@ -179,6 +179,18 @@ def is_chunked_array(self, data: Any) -> bool: Check if the given object is an instance of this type of chunked array. Compares against the type stored in the array_cls attribute by default. + + Parameters + ---------- + data : Any + + Returns + ------- + is_chunked : bool + + See Also + -------- + dask.is_dask_collection """ return isinstance(data, self.array_cls) @@ -187,8 +199,18 @@ def chunks(self, data: T_ChunkedArray) -> T_NormalizedChunks: """ Return the current chunks of the given array. + Returns chunks explicitly as a tuple of tuple of ints. + Used internally by xarray objects' .chunks and .chunksizes properties. + Parameters + ---------- + data : chunked array + + Returns + ------- + chunks : tuple[tuple[int, ...], ...] + See Also -------- dask.array.Array.chunks @@ -205,8 +227,27 @@ def normalize_chunks( previous_chunks: T_NormalizedChunks | None = None, ) -> T_NormalizedChunks: """ + Normalize given chunking pattern into an explicit tuple of tuples representation. + + Exposed primarily because different chunking backends may want to make different decisions about how to + automatically chunk along dimensions not given explicitly in the input chunks. + Called internally by xarray.open_dataset. + Parameters + ---------- + chunks : tuple, int, dict, or string + The chunks to be normalized. + shape : Tuple[int] + The shape of the array + limit : int (optional) + The maximum block size to target in bytes, + if freedom is given to choose + dtype : np.dtype + previous_chunks : Tuple[Tuple[int]], optional + Chunks from a previous array that we should use for inspiration when + rechunking dimensions automatically. + See Also -------- dask.array.normalize_chunks @@ -218,12 +259,20 @@ def from_array( self, data: np.ndarray, chunks: T_Chunks, **kwargs ) -> T_ChunkedArray: """ - Creates a chunked array from a non-chunked numpy-like array. + Create a chunked array from a non-chunked numpy-like array. + + Generally input should have a ``.shape``, ``.ndim``, ``.dtype`` and support numpy-style slicing. Called when the .chunk method is called on an xarray object that is not already chunked. Also called within open_dataset (when chunks is not None) to create a chunked array from an xarray lazily indexed array. + Parameters + ---------- + data : array_like + chunks : int, tuple + How to chunk the array. + See Also -------- dask.Array.array.from_array @@ -241,6 +290,19 @@ def rechunk( Called when the .chunk method is called on an xarray object that is already chunked. + Parameters + ---------- + data : dask array + Array to be rechunked. + chunks : int, tuple, dict or str, optional + The new block dimensions to create. -1 indicates the full size of the + corresponding dimension. Default is "auto" which automatically + determines chunk sizes. + + Returns + ------- + chunked array + See Also -------- dask.array.Array.rechunk @@ -248,16 +310,28 @@ def rechunk( return data.rechunk(chunks, **kwargs) # type: ignore[attr-defined] @abstractmethod - def compute(self, *data: T_ChunkedArray, **kwargs) -> tuple[np.ndarray, ...]: + def compute(self, *data: T_ChunkedArray | Any, **kwargs) -> tuple[np.ndarray, ...]: """ Computes one or more chunked arrays, returning them as eager numpy arrays. Called anytime something needs to computed, including multiple arrays at once. Used by `.compute`, `.persist`, `.values`. + Parameters + ---------- + *data : object + Any number of objects. If an object is an instance of the chunked array type, it is computed + and the in-memory result returned as a numpy array. All other types should be passed through unchanged. + + Returns + ------- + objs + The input, but with all chunked arrays now computed. + See Also -------- - dask.array.compute + dask.compute + dask.array.Array.compute """ raise NotImplementedError() @@ -266,6 +340,10 @@ def array_api(self) -> Any: """ Return the array_api namespace following the python array API standard. + See https://data-apis.org/array-api/latest/ . Currently used to access the array API function + ``full_like``, which is called within the xarray constructors ``xarray.full_like``, ``xarray.ones_like``, + ``xarray.zeros_like``, etc. + See Also -------- dask.array @@ -283,7 +361,36 @@ def reduction( keepdims: bool = False, ) -> T_ChunkedArray: """ - Used in some reductions like nanfirst, which is used by groupby.first. + A general version of array reductions along one or more axes. + + Used inside some reductions like nanfirst, which is used by ``groupby.first``. + + Parameters + ---------- + arr : chunked array + Data to be reduced along one or more axes. + func : Callable(x_chunk, axis, keepdims) + First function to be executed when resolving the dask graph. + This function is applied in parallel to all original chunks of x. + See below for function parameters. + combine_func : Callable(x_chunk, axis, keepdims), optional + Function used for intermediate recursive aggregation (see + split_every below). If omitted, it defaults to aggregate_func. + aggregate_func : Callable(x_chunk, axis, keepdims) + Last function to be executed, producing the final output. It is always invoked, even when the reduced + Array counts a single chunk along the reduced axes. + axis : int or sequence of ints, optional + Axis or axes to aggregate upon. If omitted, aggregate along all axes. + dtype : np.dtype + data type of output. This argument was previously optional, but + leaving as ``None`` will now raise an exception. + keepdims : boolean, optional + Whether the reduction function should preserve the reduced axes, + leaving them at size ``output_size``, or remove them. + + Returns + ------- + chunked array See Also -------- @@ -304,7 +411,67 @@ def apply_gufunc( **kwargs, ): """ - Called inside xarray.apply_ufunc, so must be supplied for vast majority of xarray computations to be supported. + Apply a generalized ufunc or similar python function to arrays. + + ``signature`` determines if the function consumes or produces core + dimensions. The remaining dimensions in given input arrays (``*args``) + are considered loop dimensions and are required to broadcast + naturally against each other. + + In other terms, this function is like ``np.vectorize``, but for + the blocks of chunked arrays. If the function itself shall also + be vectorized use ``vectorize=True`` for convenience. + + Called inside ``xarray.apply_ufunc``, which is called internally for most xarray operations. + Therefore this method must be implemented for the vast majority of xarray computations to be supported. + + Parameters + ---------- + func : callable + Function to call like ``func(*args, **kwargs)`` on input arrays + (``*args``) that returns an array or tuple of arrays. If multiple + arguments with non-matching dimensions are supplied, this function is + expected to vectorize (broadcast) over axes of positional arguments in + the style of NumPy universal functions [1]_ (if this is not the case, + set ``vectorize=True``). If this function returns multiple outputs, + ``output_core_dims`` has to be set as well. + signature: string + Specifies what core dimensions are consumed and produced by ``func``. + According to the specification of numpy.gufunc signature [2]_ + *args : numeric + Input arrays or scalars to the callable function. + axes: List of tuples, optional, keyword only + A list of tuples with indices of axes a generalized ufunc should operate on. + For instance, for a signature of ``"(i,j),(j,k)->(i,k)"`` appropriate for + matrix multiplication, the base elements are two-dimensional matrices + and these are taken to be stored in the two last axes of each argument. The + corresponding axes keyword would be ``[(-2, -1), (-2, -1), (-2, -1)]``. + For simplicity, for generalized ufuncs that operate on 1-dimensional arrays + (vectors), a single integer is accepted instead of a single-element tuple, + and for generalized ufuncs for which all outputs are scalars, the output + tuples can be omitted. + keepdims: bool, optional, keyword only + If this is set to True, axes which are reduced over will be left in the result as + a dimension with size one, so that the result will broadcast correctly against the + inputs. This option can only be used for generalized ufuncs that operate on inputs + that all have the same number of core dimensions and with outputs that have no core + dimensions , i.e., with signatures like ``"(i),(i)->()"`` or ``"(m,m)->()"``. + If used, the location of the dimensions in the output can be controlled with axes + and axis. + output_dtypes : Optional, dtype or list of dtypes, keyword only + Valid numpy dtype specification or list thereof. + If not given, a call of ``func`` with a small set of data + is performed in order to try to automatically determine the + output dtypes. + vectorize: bool, keyword only + If set to ``True``, ``np.vectorize`` is applied to ``func`` for + convenience. Defaults to ``False``. + **kwargs : dict + Extra keyword arguments to pass to `func` + + Returns + ------- + Single chunked array or tuple of chunked arrays See Also -------- @@ -323,11 +490,41 @@ def map_blocks( **kwargs, ): """ - Called in elementwise operations, but notably not called in xarray.map_blocks. + Map a function across all blocks of a chunked array. + + Called in elementwise operations, but notably not (currently) called within xarray.map_blocks. + + Parameters + ---------- + func : callable + Function to apply to every block in the array. + If ``func`` accepts ``block_info=`` or ``block_id=`` + as keyword arguments, these will be passed dictionaries + containing information about input and output chunks/arrays + during computation. See examples for details. + args : dask arrays or other objects + dtype : np.dtype, optional + The ``dtype`` of the output array. It is recommended to provide this. + If not provided, will be inferred by applying the function to a small + set of fake data. + chunks : tuple, optional + Chunk shape of resulting blocks if the function does not preserve + shape. If not provided, the resulting array is assumed to have the same + block structure as the first input array. + drop_axis : number or iterable, optional + Dimensions lost by the function. + new_axis : number or iterable, optional + New dimensions created by the function. Note that these are applied + after ``drop_axis`` (if present). + **kwargs : + Other keyword arguments to pass to function. Values must be constants + (not dask.arrays) See Also -------- dask.array.map_blocks + dask.array.blockwise : Generalized operation with control over block alignment. + dask.array.map_overlap : Generalized operation with overlap between neighbors. """ raise NotImplementedError() @@ -342,7 +539,38 @@ def blockwise( **kwargs, ): """ - Called by some niche functions in xarray. + Tensor operation: Generalized inner and outer products. + + A broad class of blocked algorithms and patterns can be specified with a + concise multi-index notation. The ``blockwise`` function applies an in-memory + function across multiple blocks of multiple inputs in a variety of ways. + Many chunked array operations are special cases of blockwise including + elementwise, broadcasting, reductions, tensordot, and transpose. + + Currently only called explicitly in xarray when performing multidimensional interpolation. + + Parameters + ---------- + func : callable + Function to apply to individual tuples of blocks + out_ind : iterable + Block pattern of the output, something like 'ijk' or (1, 2, 3) + *args : sequence of Array, index pairs + You may also pass literal arguments, accompanied by None index + e.g. (x, 'ij', y, 'jk', z, 'i', some_literal, None) + **kwargs : dict + Extra keyword arguments to pass to function + adjust_chunks : dict + Dictionary mapping index to function to be applied to chunk sizes + new_axes : dict, keyword only + New indexes and their dimension lengths + align_arrays: bool + Whether or not to align chunks along equally sized dimensions when + multiple arrays are provided. This allows for larger chunks in some + arrays to be broken into smaller ones that match chunk sizes in other + arrays such that they are compatible for block function mapping. If + this is false, then an error will be thrown if arrays do not already + have the same number of blocks in each dimension. See Also -------- @@ -356,8 +584,15 @@ def unify_chunks( **kwargs, ) -> tuple[dict[str, T_NormalizedChunks], list[T_ChunkedArray]]: """ + Unify chunks across a sequence of arrays. + Called by xarray.unify_chunks. + Parameters + ---------- + *args: sequence of Array, index pairs + Sequence like (x, 'ij', y, 'jk', z, 'i') + See Also -------- dask.array.unify_chunks @@ -371,7 +606,24 @@ def store( **kwargs: dict[str, Any], ): """ - Used when writing to any backend. + Store chunked arrays in array-like objects, overwriting data in target. + + This stores chunked arrays into object that supports numpy-style setitem + indexing (e.g. a Zarr Store). Allows storing values chunk by chunk so that it does not have to + fill up memory. For best performance you likely want to align the block size of + the storage target with the block size of your array. + + Used when writing to any registered xarray I/O backend. + + Parameters + ---------- + sources: Array or collection of Arrays + targets: array-like or collection of array-likes + These should support setitem syntax ``target[10:20] = ...``. + If sources is a single item, targets must be a single item; if sources is a + collection of arrays, targets must be a matching collection. + kwargs: + Parameters passed to compute/persist (only used if compute=True) See Also -------- From 039b9d78976c701fd10d21131cf6d11493eb65f6 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 11:22:02 -0400 Subject: [PATCH 55/63] mention from_array specifically and link to requirements section of duck array internals page --- doc/internals/chunked-arrays.rst | 6 +++++- doc/internals/duck-arrays-integration.rst | 2 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 0ae90da992f..45250dbd22c 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -20,7 +20,7 @@ implements the handling of processing all of the chunks. Chunked array methods and "core operations" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A chunked array needs to meet all the :ref:`requirements for normal duck arrays `, but must also +A chunked array needs to meet all the :ref:`requirements for normal duck arrays `, but must also implement additional features. Chunked arrays have additional attributes and methods, such as ``.chunks`` and ``.rechunk``. @@ -38,6 +38,10 @@ also known as a "Chunk Manager". Therefore **a full list of the operations that API of the** :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` **abstract base class**. Note that chunked array methods are also currently dispatched using this class. +Chunked array creation is also handled by this class. As chunked array objects have a one-to-one correspondence with +in-memory numpy arrays, it should be possible to create a chunked array from a numpy array by passing the desired +chunking pattern to an implementation of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint.from_array``. + .. note:: The :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class is mostly just acting as a diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index 3b6313dbf2f..1f1f57974df 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -11,6 +11,8 @@ Integrating with duck arrays Xarray can wrap custom numpy-like arrays (":term:`duck array`\s") - see the :ref:`user guide documentation `. This page is intended for developers who are interested in wrapping a new custom array type with xarray. +.. _internals.duckarrays.requirements: + Duck array requirements ~~~~~~~~~~~~~~~~~~~~~~~ From d6c3cba061e61a587dad4213f8d83adb7987ff97 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 11:27:29 -0400 Subject: [PATCH 56/63] add explicit link to cubed --- doc/internals/chunked-arrays.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 45250dbd22c..8eec59cb936 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -12,6 +12,8 @@ Alternative chunked array types Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface. This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands. +In particular xarray can now also supports wrapping :py:class:`cubed.Array` objects +(see `Cubed's documentation `_ and the `cubed-xarray package `_). The basic idea is that by wrapping an array that has an explicit notion of ``.chunks``, xarray can expose control over the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually From 7386f6692fe90b3c88de42a11c254be6b9f4d436 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 11:29:07 -0400 Subject: [PATCH 57/63] mention ramba and arkouda --- doc/internals/chunked-arrays.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 8eec59cb936..4c6bcfb02b1 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -96,5 +96,6 @@ Parallel processing without chunks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To use a parallel array type that does not expose a concept of chunks explicitly, none of the information on this page -is theoretically required. Such an array type could be wrapped using xarray's existing -support for :ref:`numpy-like "duck" arrays `. +is theoretically required. Such an array type (e.g. `Ramba `_ or +`Arkouda `_) could be wrapped using xarray's existing support for +:ref:`numpy-like "duck" arrays `. From 865f4c7424bdff8349212f2a5b02bfdd1d42ee26 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 5 Jul 2023 15:45:49 +0000 Subject: [PATCH 58/63] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- xarray/core/dataset.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/core/dataset.py b/xarray/core/dataset.py index 0cabe803f7d..8898cff789a 100644 --- a/xarray/core/dataset.py +++ b/xarray/core/dataset.py @@ -8648,7 +8648,7 @@ def argmax(self: T_Dataset, dim: Hashable | None = None, **kwargs) -> T_Dataset: ... ) # Indices of the maximum values along the 'student' dimension are calculated - + >>> argmax_indices = dataset.argmax(dim="test") >>> argmax_indices From c8c1c20614c9081a6f1b956bcc6d5a131de3ce44 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 14:10:44 -0400 Subject: [PATCH 59/63] py:mod --- doc/internals/chunked-arrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 4c6bcfb02b1..8560b1ce0c3 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -29,7 +29,7 @@ Chunked arrays have additional attributes and methods, such as ``.chunks`` and ` Furthermore, Xarray dispatches chunk-aware computations across one or more chunked arrays using special functions known as "core operations". Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``. -The core operations are generalizations of functions first implemented in :py:module:`dask.array`. +The core operations are generalizations of functions first implemented in :py:mod:`dask.array`. The implementation of these functions is specific to the type of arrays passed to them. For example, when applying the ``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`, whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. From 6d7b68bb84eda29d84901db850ce689387771f98 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 5 Jul 2023 14:11:49 -0400 Subject: [PATCH 60/63] Present tense regarding wrapping cubed Co-authored-by: Deepak Cherian --- doc/internals/chunked-arrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index 4c6bcfb02b1..2628e56be30 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -12,7 +12,7 @@ Alternative chunked array types Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface. This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands. -In particular xarray can now also supports wrapping :py:class:`cubed.Array` objects +In particular xarray also supports wrapping :py:class:`cubed.Array` objects (see `Cubed's documentation `_ and the `cubed-xarray package `_). The basic idea is that by wrapping an array that has an explicit notion of ``.chunks``, xarray can expose control over From 317a35fbf9013f1ad480469e5957efc21ca62368 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 14:32:50 -0400 Subject: [PATCH 61/63] add links to cubed --- xarray/core/parallelcompat.py | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/xarray/core/parallelcompat.py b/xarray/core/parallelcompat.py index e2dc2db63f0..1c50e89abdb 100644 --- a/xarray/core/parallelcompat.py +++ b/xarray/core/parallelcompat.py @@ -214,6 +214,7 @@ def chunks(self, data: T_ChunkedArray) -> T_NormalizedChunks: See Also -------- dask.array.Array.chunks + cubed.Array.chunks """ raise NotImplementedError() @@ -275,7 +276,8 @@ def from_array( See Also -------- - dask.Array.array.from_array + dask.array.from_array + cubed.from_array """ raise NotImplementedError() @@ -306,6 +308,7 @@ def rechunk( See Also -------- dask.array.Array.rechunk + cubed.Array.rechunk """ return data.rechunk(chunks, **kwargs) # type: ignore[attr-defined] @@ -331,7 +334,7 @@ def compute(self, *data: T_ChunkedArray | Any, **kwargs) -> tuple[np.ndarray, .. See Also -------- dask.compute - dask.array.Array.compute + cubed.compute """ raise NotImplementedError() @@ -347,6 +350,7 @@ def array_api(self) -> Any: See Also -------- dask.array + cubed.array_api """ raise NotImplementedError() @@ -395,6 +399,7 @@ def reduction( See Also -------- dask.array.reduction + cubed.core.reduction """ raise NotImplementedError() @@ -476,6 +481,7 @@ def apply_gufunc( See Also -------- dask.array.apply_gufunc + cubed.apply_gufunc """ raise NotImplementedError() @@ -523,8 +529,7 @@ def map_blocks( See Also -------- dask.array.map_blocks - dask.array.blockwise : Generalized operation with control over block alignment. - dask.array.map_overlap : Generalized operation with overlap between neighbors. + cubed.map_blocks """ raise NotImplementedError() @@ -575,6 +580,7 @@ def blockwise( See Also -------- dask.array.blockwise + cubed.core.blockwise """ raise NotImplementedError() @@ -596,6 +602,7 @@ def unify_chunks( See Also -------- dask.array.unify_chunks + cubed.core.unify_chunks """ raise NotImplementedError() @@ -628,5 +635,6 @@ def store( See Also -------- dask.array.store + cubed.store """ raise NotImplementedError() From 71238ce3890a829176dacba0a23eb45cbd3860bf Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 14:34:54 -0400 Subject: [PATCH 62/63] add references for numpy links in apply_gufunc docstring --- xarray/core/parallelcompat.py | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/xarray/core/parallelcompat.py b/xarray/core/parallelcompat.py index 1c50e89abdb..0ac73d8f1c7 100644 --- a/xarray/core/parallelcompat.py +++ b/xarray/core/parallelcompat.py @@ -482,6 +482,11 @@ def apply_gufunc( -------- dask.array.apply_gufunc cubed.apply_gufunc + + References + ---------- + .. [1] https://docs.scipy.org/doc/numpy/reference/ufuncs.html + .. [2] https://docs.scipy.org/doc/numpy/reference/c-api/generalized-ufuncs.html """ raise NotImplementedError() From a66c25af33e55693bbc64cffcf14ddff172cd700 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 5 Jul 2023 15:45:46 -0400 Subject: [PATCH 63/63] fix some broken links to docstrings --- doc/internals/chunked-arrays.rst | 11 ++++++----- xarray/core/parallelcompat.py | 8 ++++---- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst index d1c256ac6d1..7192c3f0bc5 100644 --- a/doc/internals/chunked-arrays.rst +++ b/doc/internals/chunked-arrays.rst @@ -82,15 +82,16 @@ User interface Once the chunkmanager subclass has been registered, xarray objects wrapping the desired array type can be created in 3 ways: -#. By manually passing the array type to the :py:class:`DataArray` constructor, see the examples for :ref:`numpy-like arrays `, +#. By manually passing the array type to the :py:class:`~xarray.DataArray` constructor, see the examples for :ref:`numpy-like arrays `, -#. Calling :py:meth:`DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``, +#. Calling :py:meth:`~xarray.DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``, -#. Calling :py:func:`open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``. +#. Calling :py:func:`~xarray.open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``. The latter two methods ultimately call the chunkmanager's implementation of ``.from_array``, to which they pass the ``from_array_kwargs`` dict. -The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to ``'dask'`` if found, -otherwise to whichever chunkmanager is registered if only one is registered, else it will raise an error. +The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to ``'dask'`` +if Dask is installed, otherwise it defaults to whichever chunkmanager is registered if only one is registered. +If multiple chunkmanagers are registered it will raise an error by default. Parallel processing without chunks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/xarray/core/parallelcompat.py b/xarray/core/parallelcompat.py index 0ac73d8f1c7..26efc5fc412 100644 --- a/xarray/core/parallelcompat.py +++ b/xarray/core/parallelcompat.py @@ -35,7 +35,7 @@ def list_chunkmanagers() -> dict[str, ChunkManagerEntrypoint]: Returns ------- - chunnkmanagers : dict + chunkmanagers : dict Dictionary whose values are registered ChunkManagerEntrypoint subclass instances, and whose values are the strings under which they are registered. @@ -251,7 +251,7 @@ def normalize_chunks( See Also -------- - dask.array.normalize_chunks + dask.array.core.normalize_chunks """ raise NotImplementedError() @@ -480,7 +480,7 @@ def apply_gufunc( See Also -------- - dask.array.apply_gufunc + dask.array.gufunc.apply_gufunc cubed.apply_gufunc References @@ -606,7 +606,7 @@ def unify_chunks( See Also -------- - dask.array.unify_chunks + dask.array.core.unify_chunks cubed.core.unify_chunks """ raise NotImplementedError()