Skip to content

Commit 9344e2e

Browse files
committed
explain how to add async support for any backend in the docs
1 parent 67ba26a commit 9344e2e

File tree

2 files changed

+33
-17
lines changed

2 files changed

+33
-17
lines changed

doc/internals/how-to-add-new-backend.rst

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -325,39 +325,42 @@ information on plugins.
325325
How to support lazy loading
326326
+++++++++++++++++++++++++++
327327

328-
If you want to make your backend effective with big datasets, then you should
329-
support lazy loading.
330-
Basically, you shall replace the :py:class:`numpy.ndarray` inside the
331-
variables with a custom class that supports lazy loading indexing.
328+
If you want to make your backend effective with big datasets, then you should take advantage of xarray's
329+
support for lazy loading and indexing.
330+
331+
Basically, when your backend constructs the ``Variable`` objects,
332+
you need to replace the :py:class:`numpy.ndarray` inside the
333+
variables with a custom :py:class:`~xarray.backends.BackendArray` subclass that supports lazy loading and indexing.
332334
See the example below:
333335

334336
.. code-block:: python
335-
336337
backend_array = MyBackendArray()
337338
data = indexing.LazilyIndexedArray(backend_array)
338339
var = xr.Variable(dims, data, attrs=attrs, encoding=encoding)
339340
340341
Where:
341342

342-
- :py:class:`~xarray.core.indexing.LazilyIndexedArray` is a class
343-
provided by Xarray that manages the lazy loading.
344-
- ``MyBackendArray`` shall be implemented by the backend and shall inherit
343+
- :py:class:`~xarray.core.indexing.LazilyIndexedArray` is a wrapper class
344+
provided by Xarray that manages the lazy loading and indexing.
345+
- ``MyBackendArray`` should be implemented by the backend and must inherit
345346
from :py:class:`~xarray.backends.BackendArray`.
346347

347348
BackendArray subclassing
348349
^^^^^^^^^^^^^^^^^^^^^^^^
349350

350-
The BackendArray subclass shall implement the following method and attributes:
351+
The BackendArray subclass must implement the following method and attributes:
351352

352-
- the ``__getitem__`` method that takes in input an index and returns a
353-
`NumPy <https://numpy.org/>`__ array
354-
- the ``shape`` attribute
353+
- the ``__getitem__`` method that takes an index as an input and returns a
354+
`NumPy <https://numpy.org/>`__ array,
355+
- the ``shape`` attribute,
355356
- the ``dtype`` attribute.
356357

357-
Xarray supports different type of :doc:`/user-guide/indexing`, that can be
358-
grouped in three types of indexes
358+
It may also optionally implement an additional ``async_getitem`` method.
359+
360+
Xarray supports different types of :doc:`/user-guide/indexing`, that can be
361+
grouped in three types of indexes:
359362
:py:class:`~xarray.core.indexing.BasicIndexer`,
360-
:py:class:`~xarray.core.indexing.OuterIndexer` and
363+
:py:class:`~xarray.core.indexing.OuterIndexer`, and
361364
:py:class:`~xarray.core.indexing.VectorizedIndexer`.
362365
This implies that the implementation of the method ``__getitem__`` can be tricky.
363366
In order to simplify this task, Xarray provides a helper function,
@@ -413,8 +416,22 @@ input the ``key``, the array ``shape`` and the following parameters:
413416
For more details see
414417
:py:class:`~xarray.core.indexing.IndexingSupport` and :ref:`RST indexing`.
415418

419+
Async support
420+
^^^^^^^^^^^^^
421+
422+
Backends can also optionally support loading data asynchronously via xarray's asynchronous loading methods
423+
(e.g. ``~xarray.Dataset.load_async``).
424+
To support async loading the `BackendArray` subclass must additionally implement the ``BackendArray.async_getitem`` method.
425+
426+
Note that implementing this method is only necessary if you want to be able to load data from different xarray objects concurrently.
427+
Even without this method your ``BackendArray`` implementation is still free to concurrently load chunks of data for a single ``Variable`` itself,
428+
so long as it does so behind the synchronous ``__getitem__`` interface.
429+
430+
Dask support
431+
^^^^^^^^^^^^
432+
416433
In order to support `Dask Distributed <https://distributed.dask.org/>`__ and
417-
:py:mod:`multiprocessing`, ``BackendArray`` subclass should be serializable
434+
:py:mod:`multiprocessing`, the ``BackendArray`` subclass should be serializable
418435
either with :ref:`io.pickle` or
419436
`cloudpickle <https://github.com/cloudpipe/cloudpickle>`__.
420437
That implies that all the reference to open files should be dropped. For

xarray/backends/zarr.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,6 @@ def __getitem__(self, key):
234234
# could possibly have a work-around for 0d data here
235235

236236
async def async_getitem(self, key):
237-
# this doesn't need to be async
238237
array = self._array
239238
if isinstance(key, indexing.BasicIndexer):
240239
method = self._async_getitem

0 commit comments

Comments
 (0)