Skip to content

Commit 1be8063

Browse files
dcherianscottyhqpre-commit-ci[bot]
authored
Add pandas interval index (#11)
Co-authored-by: Scott Henderson <3924836+scottyhq@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent ce16236 commit 1be8063

File tree

3 files changed

+103
-6
lines changed

3 files changed

+103
-6
lines changed

docs/builtin/pandas.md

Lines changed: 0 additions & 5 deletions
This file was deleted.

docs/builtin/pdinterval.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
format_name: myst
5+
kernelspec:
6+
display_name: Python 3
7+
name: python
8+
---
9+
10+
# pandas: IntervalIndex
11+
12+
````{grid}
13+
```{grid-item}
14+
:columns: 3
15+
```{image} https://pandas.pydata.org/docs/_static/pandas.svg
16+
---
17+
alt: Alt text
18+
width: 200px
19+
align: center
20+
---
21+
```
22+
```{grid-item}
23+
:columns: 9
24+
```{seealso}
25+
Learn more at the [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#intervalindex) documentation.
26+
```
27+
````
28+
29+
# Highlights
30+
31+
1. Xarray's built-in support for pandas Index classes extends to more sophisticated classes like {py:class}`pandas.IntervalIndex`.
32+
1. Xarray now generates such indexes automatically when using {py:meth}`xarray.DataArray.groupby_bins` or {py:meth}`xarray.Dataset.groupby_bins`.
33+
1. Sadly {py:class}`pandas.IntervalIndex` supports numpy datetimes but not [cftime](https://unidata.github.io/cftime/).
34+
35+
```{important}
36+
A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values".
37+
```
38+
39+
## Example
40+
41+
### Assigning
42+
43+
```{code-cell}
44+
%xmode minimal
45+
46+
import pandas as pd
47+
import xarray as xr
48+
49+
xr.set_options(display_expand_indexes=True, display_expand_attrs=False)
50+
pd.set_option('display.max_seq_items', 10)
51+
52+
orig = xr.tutorial.open_dataset("air_temperature")
53+
orig
54+
```
55+
56+
Let's replace the `time` vector with an IntervalIndex, assuming that the data represent averages over 6 hour periods centered at 00h, 06h, 12h, 18h
57+
58+
```{code-cell}
59+
left = orig.time.data - pd.Timedelta("3h")
60+
right = orig.time.data + pd.Timedelta("3h")
61+
time_bounds = pd.IntervalIndex.from_arrays(left, right, closed="left")
62+
time_bounds
63+
```
64+
65+
```{code-cell}
66+
indexed = orig.copy(deep=True)
67+
indexed["time"] = time_bounds
68+
indexed
69+
```
70+
71+
Note the above object still shows the `time` coordinates has associated `PandasIndex` but the values are now represented in and "IntervalArray" (as indicated by `interval[datetime64[ns], left]`)
72+
73+
### Indexing
74+
75+
Let's index out a representative value for 2013-05-01 02:00.
76+
77+
```{code-cell}
78+
---
79+
tags: [raises-exception]
80+
---
81+
orig.sel(time="2013-05-01 02:00")
82+
```
83+
84+
Indexing the original dataset required specifying `method="nearest"`
85+
86+
```{code-cell}
87+
orig.sel(time="2013-05-01 02:00", method="nearest").time
88+
```
89+
90+
With an IntervalIndex, however, that is unnecessary
91+
92+
```{code-cell}
93+
indexed.sel(time="2013-05-01 02:00").time
94+
```
95+
96+
### Binned grouping
97+
98+
Xarray now creates IntervalIndex by default for binned grouping operations
99+
100+
```{code-cell}
101+
orig.groupby_bins("lat", bins=[25, 35, 45, 55]).mean()
102+
```

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ caption: Built-in
1010
hidden:
1111
---
1212
builtin/range
13-
builtin/pandas
13+
builtin/pdinterval
1414
```
1515

1616
```{toctree}

0 commit comments

Comments
 (0)