3
3
Development roadmap
4
4
===================
5
5
6
- Authors: Stephan Hoyer, Joe Hamman and xarray developers
6
+ Authors: Xarray developers
7
7
8
- Date: July 24, 2018
8
+ Date: September 7, 2021
9
9
10
10
Xarray is an open source Python library for labeled multidimensional
11
11
arrays and datasets.
@@ -20,15 +20,15 @@ Why has xarray been successful? In our opinion:
20
20
21
21
- The dominant use-case for xarray is for analysis of gridded
22
22
dataset in the geosciences, e.g., as part of the
23
- `Pangeo <http://pangeo-data.org >`__ project.
23
+ `Pangeo <http://pangeo.io >`__ project.
24
24
- Xarray is also used more broadly in the physical sciences, where
25
25
we've found the needs for analyzing multidimensional datasets are
26
26
remarkably consistent (e.g., see
27
27
`SunPy <https://github.com/sunpy/ndcube >`__ and
28
28
`PlasmaPy <https://github.com/PlasmaPy/PlasmaPy/issues/59 >`__).
29
29
- Finally, xarray is used in a variety of other domains, including
30
30
finance, `probabilistic
31
- programming <https://github.com/ arviz-devs/arviz/issues/97 > `__ and
31
+ programming <https://arviz-devs.github.io /arviz/> `__ and
32
32
genomics.
33
33
34
34
- Xarray is also a **domain agnostic ** solution:
@@ -87,12 +87,17 @@ We can generalize the community's needs into three main categories:
87
87
- More flexible grids/indexing.
88
88
- More flexible arrays/computing.
89
89
- More flexible storage backends.
90
+ - More flexible data structures.
90
91
91
92
Each of these are detailed further in the subsections below.
92
93
93
94
Flexible indexes
94
95
~~~~~~~~~~~~~~~~
95
96
97
+ .. note ::
98
+ Work on flexible grids and indexes is currently underway. See
99
+ `GH Project #1 <https://github.com/pydata/xarray/projects/1 >`__ for more detail.
100
+
96
101
Xarray currently keeps track of indexes associated with coordinates by
97
102
storing them in the form of a ``pandas.Index `` in special
98
103
``xarray.IndexVariable `` objects.
@@ -130,6 +135,10 @@ build upon indexing, such as groupby operations with multiple variables.
130
135
Flexible arrays
131
136
~~~~~~~~~~~~~~~
132
137
138
+ .. note ::
139
+ Work on flexible arrays is currently underway. See
140
+ `GH Project #2 <https://github.com/pydata/xarray/projects/2 >`__ for more detail.
141
+
133
142
Xarray currently supports wrapping multidimensional arrays defined by
134
143
NumPy, dask and to a limited-extent pandas. It would be nice to have
135
144
interfaces that allow xarray to wrap alternative N-D array
@@ -160,6 +169,10 @@ third-party libraries.
160
169
Flexible storage
161
170
~~~~~~~~~~~~~~~~
162
171
172
+ .. note ::
173
+ Work on flexible storage backends is currently underway. See
174
+ `GH Project #3 <https://github.com/pydata/xarray/projects/3 >`__ for more detail.
175
+
163
176
The xarray backends module has grown in size and complexity. Much of
164
177
this growth has been "organic" and mostly to support incremental
165
178
additions to the supported backends. This has left us with a fragile
@@ -181,9 +194,66 @@ development would include:
181
194
- Possibly moving some infrequently used backends to third-party
182
195
packages.
183
196
197
+ Flexible data structures
198
+ ~~~~~~~~~~~~~~~~~~~~~~~~
199
+
200
+ Xarray provides two primary data structures, the ``xarray.DataArray `` and
201
+ the ``xarray.Dataset ``. This section describes two possible data model
202
+ extensions.
203
+
204
+ Tree-like data structure
205
+ ++++++++++++++++++++++++
206
+
207
+ .. note ::
208
+ Work on developing a hierarchical data structure in Xarray is just
209
+ beginning. See `Datatree <https://github.com/TomNicholas/datatree >`__
210
+ for an early prototype.
211
+
212
+ Xarray’s highest-level object is currently an ``xarray.Dataset ``, whose data
213
+ model echoes that of a single netCDF group. However real-world datasets are
214
+ often better represented by a collection of related Datasets. Particular common
215
+ examples include:
216
+
217
+ - Multi-resolution datasets,
218
+ - Collections of time series datasets with differing lengths,
219
+ - Heterogeneous datasets comprising multiple different types of related
220
+ observational or simulation data,
221
+ - Bayesian workflows involving various statistical distributions over multiple
222
+ variables,
223
+ - Whole netCDF files containing multiple groups.
224
+ - Comparison of output from many similar models (such as in the IPCC's Coupled Model Intercomparison Projects)
225
+
226
+ A new tree-like data structure which is essentially a structured hierarchical
227
+ collection of Datasets could represent these cases, and would instead map to
228
+ multiple netCDF groups (see `GH4118 <https://github.com/pydata/xarray/issues/4118 >`__.).
229
+
230
+ Currently there are several libraries which have wrapped xarray in order to build
231
+ domain-specific data structures (e.g. `xarray-multiscale <https://github.com/JaneliaSciComp/xarray-multiscale >`__.),
232
+ but a general ``xarray.DataTree `` object would obviate the need for these and]
233
+ consolidate effort in a single domain-agnostic tool, much as xarray has already achieved.
234
+
235
+ Labeled array without coordinates
236
+ +++++++++++++++++++++++++++++++++
237
+
238
+ There is a need for a lightweight array structure with named dimensions for
239
+ convenient indexing and broadcasting. Xarray includes such a structure internally
240
+ (``xarray.Variable ``). We want to factor out Xarray's “Variable” object into a
241
+ standalone package with minimal dependencies for integration with libraries that
242
+ don't want to inherit Xarray's dependency on Pandas (e.g. scikit-learn).
243
+ The new “Variable” class will follow established array protocols and the new
244
+ data-apis standard. It will be capable of wrapping multiple array-like objects
245
+ (e.g. NumPy, Dask, Sparse, Pint, CuPy, Pytorch). While “DataArray” fits some of
246
+ these requirements, it offers a more complex data model than is desired for
247
+ many applications and depends on Pandas.
248
+
184
249
Engaging more users
185
250
-------------------
186
251
252
+ .. note ::
253
+ Work on improving Xarray’s documentation and user engagement is
254
+ currently underway. See `GH Project #4 <https://github.com/pydata/xarray/projects/4 >`__
255
+ for more detail.
256
+
187
257
Like many open-source projects, the documentation of xarray has grown
188
258
together with the library's features. While we think that the xarray
189
259
documentation is comprehensive already, we acknowledge that the adoption
0 commit comments