Skip to content

Commit 97fc853

Browse files
committed
Add pandas interval index
1 parent 54877e7 commit 97fc853

File tree

3 files changed

+101
-6
lines changed

3 files changed

+101
-6
lines changed

docs/builtin/pandas.md

Lines changed: 0 additions & 5 deletions
This file was deleted.

docs/builtin/pdinterval.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
format_name: myst
5+
kernelspec:
6+
display_name: Python 3
7+
name: python
8+
---
9+
10+
# pandas: IntervalIndex
11+
12+
````{grid}
13+
```{grid-item}
14+
:columns: 3
15+
```{image} https://pandas.pydata.org/docs/_static/pandas.svg
16+
---
17+
alt: Alt text
18+
width: 200px
19+
align: center
20+
---
21+
```
22+
```{grid-item}
23+
:columns: 9
24+
```{seealso}
25+
Learn more at the [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#intervalindex) documentation.
26+
```
27+
````
28+
29+
# Highlights
30+
31+
1. Xarray's built-in support for pandas Index classes extends to more sophisticated classes like {py:class}`pandas.IntervalIndex`.
32+
1. Xarray now generates such indexes automatically when using {py:meth}`xarray.DataArray.groupby_bins` or {py:meth}`xarray.Dataset.groupby_bins`.
33+
1. Sadly {py:class}`pandas.IntervalIndex` supports numpy datetimes but not cftime.
34+
35+
```{important}
36+
A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values".
37+
```
38+
39+
## Example
40+
41+
### Assigning
42+
43+
```{code-cell}
44+
%xmode minimal
45+
46+
import pandas as pd
47+
import xarray as xr
48+
49+
xr.set_options(display_expand_indexes=True, display_expand_attrs=False)
50+
pd.set_option('display.max_seq_items', 10)
51+
52+
orig = xr.tutorial.open_dataset("air_temperature")
53+
orig
54+
```
55+
56+
Let's replace the `time` vector with an IntervalIndex, assuming that the data represent averages over 6 hour periods centered at 00h, 06h, 12h, 18h
57+
58+
```{code-cell}
59+
left = orig.time.data - pd.Timedelta("3h")
60+
right = orig.time.data + pd.Timedelta("3h")
61+
time_bounds = pd.IntervalIndex.from_arrays(left, right, closed="left")
62+
time_bounds
63+
```
64+
65+
```{code-cell}
66+
indexed = orig.copy(deep=True)
67+
indexed["time"] = time_bounds
68+
indexed
69+
```
70+
71+
### Indexing
72+
73+
Let's index out a representative value for 2013-05-01 02:00.
74+
75+
```{code-cell}
76+
---
77+
tags: [raises-exception]
78+
---
79+
orig.sel(time="2013-05-01 02:00")
80+
```
81+
82+
Indexing the original dataset required specifying `method="nearest"`
83+
84+
```{code-cell}
85+
orig.sel(time="2013-05-01 02:00", method="nearest").time
86+
```
87+
88+
With an IntervalIndex, however, that is unnecessary
89+
90+
```{code-cell}
91+
indexed.sel(time="2013-05-01 02:00").time
92+
```
93+
94+
### Binned grouping
95+
96+
Xarray now creates IntervalIndex by default for binned grouping operations
97+
98+
```{code-cell}
99+
orig.groupby_bins("lat", bins=[25, 35, 45, 55]).mean()
100+
```

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ caption: Built-in
1010
hidden:
1111
---
1212
builtin/range
13-
builtin/pandas
13+
builtin/pdinterval
1414
```
1515

1616
```{toctree}

0 commit comments

Comments
 (0)