|
1 |
| -# Xarray Custom Indexes Gallery |
| 1 | +# Xarray Indexes Gallery |
2 | 2 |
|
3 |
| -## What are Indexes? |
| 3 | +## Background |
| 4 | + |
| 5 | +Xarray's data model was initially heavily inspired from the |
| 6 | +[NetCDF](https://www.unidata.ucar.edu/software/netcdf/) file format, making it |
| 7 | +well suited for working with n-dimensional, rectilinear gridded datasets |
| 8 | +commonly found in scientific data analysis, especially in the geosciences. In |
| 9 | +fact, Xarray has used many versions of the {ref}`schematic below <xarray-diagram-traditional>` to convey a "canonical" data structure that are |
| 10 | +ubiquitous in geosciences (3D datasets with coordinates that are either 2D or |
| 11 | +1D). |
| 12 | + |
| 13 | +```{figure} _static/figs/xarray-dataset-diagram-legacy.png |
| 14 | +--- |
| 15 | +name: xarray-diagram-traditional |
| 16 | +alt: Xarray data model |
| 17 | +align: center |
| 18 | +width: 500px |
| 19 | +class: dark-light |
| 20 | +--- |
| 21 | +An illustration of the traditional Xarray data model. |
| 22 | +``` |
| 23 | + |
| 24 | +Over the years, Xarray has evolved and has been adopted in an increasing number |
| 25 | +of domains as a convenient, general-purpose Python library for handling |
| 26 | +n-dimensional labeled arrays. Xarray's data structures are now being used for |
| 27 | +representing a wide range of datasets including sparse data, curvilinear or |
| 28 | +irregular grids, staggered grids, discrete global grids, image stacks and vector |
| 29 | +data cubes. Consequently, we'll here expand our minds to consider data |
| 30 | +structures that are {ref}`much more versatile <xarray-diagram-wild>` 🤯. |
| 31 | + |
| 32 | +```{figure} _static/figs/xarray-dataset-diagram-new.png |
| 33 | +--- |
| 34 | +name: xarray-diagram-wild |
| 35 | +alt: Xarray wild dataset |
| 36 | +align: center |
| 37 | +width: 500px |
| 38 | +class: dark-light |
| 39 | +--- |
| 40 | +A better illustration of the variety of Xarray datasets in the wild. |
| 41 | +``` |
| 42 | + |
| 43 | +## What is an Xarray index? |
| 44 | + |
| 45 | +In order to analyze these increasingly complex data structures in Xarray, we |
| 46 | +require a flexible indexing system. |
| 47 | + |
| 48 | +- _What is an index?_ |
| 49 | + |
| 50 | +This is a common concept in database systems and data-frame libraries. Generally |
| 51 | +speaking: |
| 52 | + |
| 53 | +> An index is a data structure that permits fast data lookup and retrieval. |
| 54 | +
|
| 55 | +For example, a {py:class}`pandas.Index` object can be used to efficiently select |
| 56 | +rows of a {py:class}`pandas.DataFrame` by one or more labels. Two different |
| 57 | +DataFrame objects may be combined together thanks to their index. |
| 58 | + |
| 59 | +- _What about Xarray?_ |
| 60 | + |
| 61 | +Until recently Xarray exclusively relied on {py:class}`pandas.Index` to allow |
| 62 | +fast label-based selection and alignment of n-dimensional data via the concept |
| 63 | +of {term}`"dimension" coordinates <xarray:Dimension coordinate>`. This approach |
| 64 | +worked very well for rectilinear gridded datasets but quickly reached its limits |
| 65 | +when considering other kinds of datasets. |
| 66 | + |
| 67 | +While Xarray still follows the same approach as its default behavior, it has |
| 68 | +also become much more flexible: an {py:class}`xarray.Dataset` or |
| 69 | +{py:class}`xarray.DataArray` may now have one or more custom |
| 70 | +{py:class}`xarray.Index` objects each associated with their own coordinates of |
| 71 | +arbitrary dimension(s). Goodbye {term}`"dimension" coordinate <xarray:Dimension coordinate>` vs. {term}`"non-dimension" coordinate <xarray:Non-dimension coordinate>` and welcome {term}`"index" coordinate <xarray:Indexed coordinate>` |
| 72 | +vs. {term}`"non-index" coordinate <xarray:Non-indexed coordinate>`! |
| 73 | + |
| 74 | +- _What is an Xarray index?_ |
| 75 | + |
| 76 | +{py:class}`xarray.Index` serves a broader purpose than a database index. It |
| 77 | +provides an API that allows dealing with coordinate data and metadata in a |
| 78 | +highly customizable way for the most common Xarray operations such as `isel`, |
| 79 | +`sel`, `align`, `concat`, `stack`... Xarray indexes usually hold, track and |
| 80 | +propagate additional information wrapped in arbitrary Python objects, along with |
| 81 | +coordinate labels and attributes. In many cases the propagation of information |
| 82 | +via custom indexes is much more efficient and/or reliable than via coordinate |
| 83 | +labels and attributes. {py:class}`xarray.Index` thus represents a powerful |
| 84 | +extension mechanism that nicely complements |
| 85 | +[accessors](https://docs.xarray.dev/en/stable/internals/extending-xarray.html) |
| 86 | +and [IO |
| 87 | +backends](https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html). |
4 | 88 |
|
5 | 89 | ## What is this website?
|
6 | 90 |
|
| 91 | +Xarray flexible indexes unlock a lot of possibilities. We hope that this gallery |
| 92 | +of Xarray built-in and 3rd-party indexes will show a good illustration of the |
| 93 | +potential of this feature and will serve as a good reference for implementing |
| 94 | +custom indexes (or simply find the existing ones that fulfill your needs). |
| 95 | + |
| 96 | +## Contribution |
| 97 | + |
| 98 | +Your additions to this gallery are very welcome, particularly for fields outside the Earth Sciences! Please open a pull request on [our GitHub repository](https://github.com/xarray-contrib/xarray-indexes) |
| 99 | + |
7 | 100 | ```{toctree}
|
8 | 101 | ---
|
9 | 102 | caption: Built-in
|
|
0 commit comments