@@ -325,39 +325,42 @@ information on plugins.
325
325
How to support lazy loading
326
326
+++++++++++++++++++++++++++
327
327
328
- If you want to make your backend effective with big datasets, then you should
329
- support lazy loading.
330
- Basically, you shall replace the :py:class: `numpy.ndarray ` inside the
331
- variables with a custom class that supports lazy loading indexing.
328
+ If you want to make your backend effective with big datasets, then you should take advantage of xarray's
329
+ support for lazy loading and indexing.
330
+
331
+ Basically, when your backend constructs the ``Variable `` objects,
332
+ you need to replace the :py:class: `numpy.ndarray ` inside the
333
+ variables with a custom :py:class: `~xarray.backends.BackendArray ` subclass that supports lazy loading and indexing.
332
334
See the example below:
333
335
334
336
.. code-block :: python
335
-
336
337
backend_array = MyBackendArray()
337
338
data = indexing.LazilyIndexedArray(backend_array)
338
339
var = xr.Variable(dims, data, attrs = attrs, encoding = encoding)
339
340
340
341
Where:
341
342
342
- - :py:class: `~xarray.core.indexing.LazilyIndexedArray ` is a class
343
- provided by Xarray that manages the lazy loading.
344
- - ``MyBackendArray `` shall be implemented by the backend and shall inherit
343
+ - :py:class: `~xarray.core.indexing.LazilyIndexedArray ` is a wrapper class
344
+ provided by Xarray that manages the lazy loading and indexing .
345
+ - ``MyBackendArray `` should be implemented by the backend and must inherit
345
346
from :py:class: `~xarray.backends.BackendArray `.
346
347
347
348
BackendArray subclassing
348
349
^^^^^^^^^^^^^^^^^^^^^^^^
349
350
350
- The BackendArray subclass shall implement the following method and attributes:
351
+ The BackendArray subclass must implement the following method and attributes:
351
352
352
- - the ``__getitem__ `` method that takes in input an index and returns a
353
- `NumPy <https://numpy.org/ >`__ array
354
- - the ``shape `` attribute
353
+ - the ``__getitem__ `` method that takes an index as an input and returns a
354
+ `NumPy <https://numpy.org/ >`__ array,
355
+ - the ``shape `` attribute,
355
356
- the ``dtype `` attribute.
356
357
357
- Xarray supports different type of :doc: `/user-guide/indexing `, that can be
358
- grouped in three types of indexes
358
+ It may also optionally implement an additional ``async_getitem `` method.
359
+
360
+ Xarray supports different types of :doc: `/user-guide/indexing `, that can be
361
+ grouped in three types of indexes:
359
362
:py:class: `~xarray.core.indexing.BasicIndexer `,
360
- :py:class: `~xarray.core.indexing.OuterIndexer ` and
363
+ :py:class: `~xarray.core.indexing.OuterIndexer `, and
361
364
:py:class: `~xarray.core.indexing.VectorizedIndexer `.
362
365
This implies that the implementation of the method ``__getitem__ `` can be tricky.
363
366
In order to simplify this task, Xarray provides a helper function,
@@ -413,8 +416,22 @@ input the ``key``, the array ``shape`` and the following parameters:
413
416
For more details see
414
417
:py:class: `~xarray.core.indexing.IndexingSupport ` and :ref: `RST indexing `.
415
418
419
+ Async support
420
+ ^^^^^^^^^^^^^
421
+
422
+ Backends can also optionally support loading data asynchronously via xarray's asynchronous loading methods
423
+ (e.g. ``~xarray.Dataset.load_async ``).
424
+ To support async loading the `BackendArray ` subclass must additionally implement the ``BackendArray.async_getitem `` method.
425
+
426
+ Note that implementing this method is only necessary if you want to be able to load data from different xarray objects concurrently.
427
+ Even without this method your ``BackendArray `` implementation is still free to concurrently load chunks of data for a single ``Variable `` itself,
428
+ so long as it does so behind the synchronous ``__getitem__ `` interface.
429
+
430
+ Dask support
431
+ ^^^^^^^^^^^^
432
+
416
433
In order to support `Dask Distributed <https://distributed.dask.org/ >`__ and
417
- :py:mod: `multiprocessing `, ``BackendArray `` subclass should be serializable
434
+ :py:mod: `multiprocessing `, the ``BackendArray `` subclass should be serializable
418
435
either with :ref: `io.pickle ` or
419
436
`cloudpickle <https://github.com/cloudpipe/cloudpickle >`__.
420
437
That implies that all the reference to open files should be dropped. For
0 commit comments