Skip to content

Commit ebf0252

Browse files
authored
Merge branch 'main' into namedarray-parallelcompat
2 parents 309cd4d + bb489fa commit ebf0252

19 files changed

+70
-19
lines changed

doc/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,7 @@
231231
# canonical_url="",
232232
repository_url="https://github.com/pydata/xarray",
233233
repository_branch="main",
234+
navigation_with_keys=False, # pydata/pydata-sphinx-theme#1492
234235
path_to_docs="doc",
235236
use_edit_page_button=True,
236237
use_repository_button=True,

doc/examples/apply_ufunc_vectorize_1d.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
"cell_type": "markdown",
1212
"metadata": {},
1313
"source": [
14-
"This example will illustrate how to conveniently apply an unvectorized function `func` to xarray objects using `apply_ufunc`. `func` expects 1D numpy arrays and returns a 1D numpy array. Our goal is to coveniently apply this function along a dimension of xarray objects that may or may not wrap dask arrays with a signature.\n",
14+
"This example will illustrate how to conveniently apply an unvectorized function `func` to xarray objects using `apply_ufunc`. `func` expects 1D numpy arrays and returns a 1D numpy array. Our goal is to conveniently apply this function along a dimension of xarray objects that may or may not wrap dask arrays with a signature.\n",
1515
"\n",
1616
"We will illustrate this using `np.interp`: \n",
1717
"\n",

doc/internals/how-to-create-custom-index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
.. currentmodule:: xarray
22

3+
.. _internals.custom indexes:
4+
35
How to create a custom index
46
============================
57

doc/internals/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,10 @@ The pages in this section are intended for:
1919
:hidden:
2020

2121
internal-design
22+
interoperability
2223
duck-arrays-integration
2324
chunked-arrays
2425
extending-xarray
25-
zarr-encoding-spec
2626
how-to-add-new-backend
2727
how-to-create-custom-index
28+
zarr-encoding-spec

doc/internals/internal-design.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@ which is used as the basic building block behind xarray's
5959
- ``data``: The N-dimensional array (typically a NumPy or Dask array) storing
6060
the Variable's data. It must have the same number of dimensions as the length
6161
of ``dims``.
62-
- ``attrs``: An ordered dictionary of metadata associated with this array. By
62+
- ``attrs``: A dictionary of metadata associated with this array. By
6363
convention, xarray's built-in operations never use this metadata.
64-
- ``encoding``: Another ordered dictionary used to store information about how
64+
- ``encoding``: Another dictionary used to store information about how
6565
these variable's data is represented on disk. See :ref:`io.encoding` for more
6666
details.
6767

@@ -95,7 +95,7 @@ all of which are implemented by forwarding on to the underlying ``Variable`` obj
9595

9696
In addition, a :py:class:`~xarray.DataArray` stores additional ``Variable`` objects stored in a dict under the private ``_coords`` attribute,
9797
each of which is referred to as a "Coordinate Variable". These coordinate variable objects are only allowed to have ``dims`` that are a subset of the data variable's ``dims``,
98-
and each dim has a specific length. This means that the full :py:attr:`~xarray.DataArray.sizes` of the dataarray can be represented by a dictionary mapping dimension names to integer sizes.
98+
and each dim has a specific length. This means that the full :py:attr:`~xarray.DataArray.size` of the dataarray can be represented by a dictionary mapping dimension names to integer sizes.
9999
The underlying data variable has this exact same size, and the attached coordinate variables have sizes which are some subset of the size of the data variable.
100100
Another way of saying this is that all coordinate variables must be "alignable" with the data variable.
101101

@@ -124,7 +124,7 @@ The :py:class:`~xarray.Dataset` class is a generalization of the :py:class:`~xar
124124
Internally all data variables and coordinate variables are stored under a single ``variables`` dict, and coordinates are
125125
specified by storing their names in a private ``_coord_names`` dict.
126126

127-
The dataset's dimensions are the set of all dims present across any variable, but (similar to in dataarrays) coordinate
127+
The dataset's ``dims`` are the set of all dims present across any variable, but (similar to in dataarrays) coordinate
128128
variables cannot have a dimension that is not present on any data variable.
129129

130130
When a data variable or coordinate variable is accessed, a new ``DataArray`` is again constructed from all compatible

doc/internals/interoperability.rst

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
.. _interoperability:
2+
3+
Interoperability of Xarray
4+
==========================
5+
6+
Xarray is designed to be extremely interoperable, in many orthogonal ways.
7+
Making xarray as flexible as possible is the common theme of most of the goals on our :ref:`roadmap`.
8+
9+
This interoperability comes via a set of flexible abstractions into which the user can plug in. The current full list is:
10+
11+
- :ref:`Custom file backends <add_a_backend>` via the :py:class:`~xarray.backends.BackendEntrypoint` system,
12+
- Numpy-like :ref:`"duck" array wrapping <internals.duckarrays>`, which supports the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_,
13+
- :ref:`Chunked distributed array computation <internals.chunkedarrays>` via the :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` system,
14+
- Custom :py:class:`~xarray.Index` objects for :ref:`flexible label-based lookups <internals.custom indexes>`,
15+
- Extending xarray objects with domain-specific methods via :ref:`custom accessors <internals.accessors>`.
16+
17+
.. warning::
18+
19+
One obvious way in which xarray could be more flexible is that whilst subclassing xarray objects is possible, we
20+
currently don't support it in most transformations, instead recommending composition over inheritance. See the
21+
:ref:`internal design page <internal design.subclassing>` for the rationale and look at the corresponding `GH issue <https://github.com/pydata/xarray/issues/3980>`_
22+
if you're interested in improving support for subclassing!
23+
24+
.. note::
25+
26+
If you think there is another way in which xarray could become more generically flexible then please
27+
tell us your ideas by `raising an issue to request the feature <https://github.com/pydata/xarray/issues/new/choose>`_!
28+
29+
30+
Whilst xarray was originally designed specifically to open ``netCDF4`` files as :py:class:`numpy.ndarray` objects labelled by :py:class:`pandas.Index` objects,
31+
it is entirely possible today to:
32+
33+
- lazily open an xarray object directly from a custom binary file format (e.g. using ``xarray.open_dataset(path, engine='my_custom_format')``,
34+
- handle the data as any API-compliant numpy-like array type (e.g. sparse or GPU-backed),
35+
- distribute out-of-core computation across that array type in parallel (e.g. via :ref:`dask`),
36+
- track the physical units of the data through computations (e.g via `pint-xarray <https://pint-xarray.readthedocs.io/en/stable/>`_),
37+
- query the data via custom index logic optimized for specific applications (e.g. an :py:class:`~xarray.Index` object backed by a KDTree structure),
38+
- attach domain-specific logic via accessor methods (e.g. to understand geographic Coordinate Reference System metadata),
39+
- organize hierarchical groups of xarray data in a :py:class:`~datatree.DataTree` (e.g. to treat heterogenous simulation and observational data together during analysis).
40+
41+
All of these features can be provided simultaneously, using libaries compatible with the rest of the scientific python ecosystem.
42+
In this situation xarray would be essentially a thin wrapper acting as pure-python framework, providing a common interface and
43+
separation of concerns via various domain-agnostic abstractions.
44+
45+
Most of the remaining pages in the documentation of xarray's internals describe these various types of interoperability in more detail.

doc/user-guide/io.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -819,7 +819,7 @@ with ``mode='a'`` on a Dataset containing the new variables, passing in an
819819
existing Zarr store or path to a Zarr store.
820820

821821
To resize and then append values along an existing dimension in a store, set
822-
``append_dim``. This is a good option if data always arives in a particular
822+
``append_dim``. This is a good option if data always arrives in a particular
823823
order, e.g., for time-stepping a simulation:
824824

825825
.. ipython:: python

doc/whats-new.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,8 @@ Bug fixes
137137
Documentation
138138
~~~~~~~~~~~~~
139139

140+
- Added page on the interoperability of xarray objects.
141+
(:pull:`7992`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
140142
- Added xarray-regrid to the list of xarray related projects (:pull:`8272`).
141143
By `Bart Schilperoort <https://github.com/BSchilperoort>`_.
142144

xarray/backends/common.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ def sync(self, compute=True, chunkmanager_store_kwargs=None):
247247
chunkmanager = get_chunked_array_type(*self.sources)
248248

249249
# TODO: consider wrapping targets with dask.delayed, if this makes
250-
# for any discernible difference in perforance, e.g.,
250+
# for any discernible difference in performance, e.g.,
251251
# targets = [dask.delayed(t) for t in self.targets]
252252

253253
if chunkmanager_store_kwargs is None:

xarray/core/accessor_dt.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -601,7 +601,7 @@ class CombinedDatetimelikeAccessor(
601601
DatetimeAccessor[T_DataArray], TimedeltaAccessor[T_DataArray]
602602
):
603603
def __new__(cls, obj: T_DataArray) -> CombinedDatetimelikeAccessor:
604-
# CombinedDatetimelikeAccessor isn't really instatiated. Instead
604+
# CombinedDatetimelikeAccessor isn't really instantiated. Instead
605605
# we need to choose which parent (datetime or timedelta) is
606606
# appropriate. Since we're checking the dtypes anyway, we'll just
607607
# do all the validation here.

0 commit comments

Comments
 (0)