From 97fc853275b333848fbf903f5aefe56f21440a68 Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Sun, 6 Jul 2025 22:40:05 -0600 Subject: [PATCH 1/3] Add pandas interval index --- docs/builtin/pandas.md | 5 -- docs/builtin/pdinterval.md | 100 +++++++++++++++++++++++++++++++++++++ docs/index.md | 2 +- 3 files changed, 101 insertions(+), 6 deletions(-) delete mode 100644 docs/builtin/pandas.md create mode 100644 docs/builtin/pdinterval.md diff --git a/docs/builtin/pandas.md b/docs/builtin/pandas.md deleted file mode 100644 index abf8713..0000000 --- a/docs/builtin/pandas.md +++ /dev/null @@ -1,5 +0,0 @@ -# More Pandas Indexes - -## IntervalIndex - -## CategoricalIndex diff --git a/docs/builtin/pdinterval.md b/docs/builtin/pdinterval.md new file mode 100644 index 0000000..b79c6fe --- /dev/null +++ b/docs/builtin/pdinterval.md @@ -0,0 +1,100 @@ +--- +jupytext: + text_representation: + format_name: myst +kernelspec: + display_name: Python 3 + name: python +--- + +# pandas: IntervalIndex + +````{grid} +```{grid-item} +:columns: 3 +```{image} https://pandas.pydata.org/docs/_static/pandas.svg +--- +alt: Alt text +width: 200px +align: center +--- +``` +```{grid-item} +:columns: 9 +```{seealso} +Learn more at the [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#intervalindex) documentation. +``` +```` + +# Highlights + +1. Xarray's built-in support for pandas Index classes extends to more sophisticated classes like {py:class}`pandas.IntervalIndex`. +1. Xarray now generates such indexes automatically when using {py:meth}`xarray.DataArray.groupby_bins` or {py:meth}`xarray.Dataset.groupby_bins`. +1. Sadly {py:class}`pandas.IntervalIndex` supports numpy datetimes but not cftime. + +```{important} +A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values". +``` + +## Example + +### Assigning + +```{code-cell} +%xmode minimal + +import pandas as pd +import xarray as xr + +xr.set_options(display_expand_indexes=True, display_expand_attrs=False) +pd.set_option('display.max_seq_items', 10) + +orig = xr.tutorial.open_dataset("air_temperature") +orig +``` + +Let's replace the `time` vector with an IntervalIndex, assuming that the data represent averages over 6 hour periods centered at 00h, 06h, 12h, 18h + +```{code-cell} +left = orig.time.data - pd.Timedelta("3h") +right = orig.time.data + pd.Timedelta("3h") +time_bounds = pd.IntervalIndex.from_arrays(left, right, closed="left") +time_bounds +``` + +```{code-cell} +indexed = orig.copy(deep=True) +indexed["time"] = time_bounds +indexed +``` + +### Indexing + +Let's index out a representative value for 2013-05-01 02:00. + +```{code-cell} +--- +tags: [raises-exception] +--- +orig.sel(time="2013-05-01 02:00") +``` + +Indexing the original dataset required specifying `method="nearest"` + +```{code-cell} +orig.sel(time="2013-05-01 02:00", method="nearest").time +``` + +With an IntervalIndex, however, that is unnecessary + +```{code-cell} +indexed.sel(time="2013-05-01 02:00").time +``` + +### Binned grouping + +Xarray now creates IntervalIndex by default for binned grouping operations + +```{code-cell} +orig.groupby_bins("lat", bins=[25, 35, 45, 55]).mean() +``` diff --git a/docs/index.md b/docs/index.md index 47859ea..e91a47a 100644 --- a/docs/index.md +++ b/docs/index.md @@ -10,7 +10,7 @@ caption: Built-in hidden: --- builtin/range -builtin/pandas +builtin/pdinterval ``` ```{toctree} From 2fcfa6664a5b438aa63816c971739c7cdff8a753 Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Mon, 7 Jul 2025 06:56:36 -0600 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: Scott Henderson <3924836+scottyhq@users.noreply.github.com> --- docs/builtin/pdinterval.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/builtin/pdinterval.md b/docs/builtin/pdinterval.md index b79c6fe..64a36f2 100644 --- a/docs/builtin/pdinterval.md +++ b/docs/builtin/pdinterval.md @@ -30,7 +30,7 @@ Learn more at the [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_gui 1. Xarray's built-in support for pandas Index classes extends to more sophisticated classes like {py:class}`pandas.IntervalIndex`. 1. Xarray now generates such indexes automatically when using {py:meth}`xarray.DataArray.groupby_bins` or {py:meth}`xarray.Dataset.groupby_bins`. -1. Sadly {py:class}`pandas.IntervalIndex` supports numpy datetimes but not cftime. +1. Sadly {py:class}`pandas.IntervalIndex` supports numpy datetimes but not [cftime](https://unidata.github.io/cftime/). ```{important} A pandas IntervalIndex models intervals using a single variable. The [Climate and Forecast Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#cell-boundaries), by contrast, model the intervals using two arrays: the intervals ("bounds" variable) and "central values". @@ -68,6 +68,7 @@ indexed["time"] = time_bounds indexed ``` +Note the above object still shows the `time` coordinates has associated `PandasIndex` but the values are now represented in and "IntervalArray" (as indicated by `interval[datetime64[ns], left]`) ### Indexing Let's index out a representative value for 2013-05-01 02:00. From 134425612142a41d23786da57627c161d113609c Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 7 Jul 2025 12:56:46 +0000 Subject: [PATCH 3/3] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- docs/builtin/pdinterval.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/builtin/pdinterval.md b/docs/builtin/pdinterval.md index 64a36f2..61f48ab 100644 --- a/docs/builtin/pdinterval.md +++ b/docs/builtin/pdinterval.md @@ -69,6 +69,7 @@ indexed ``` Note the above object still shows the `time` coordinates has associated `PandasIndex` but the values are now represented in and "IntervalArray" (as indicated by `interval[datetime64[ns], left]`) + ### Indexing Let's index out a representative value for 2013-05-01 02:00.