Skip to content

Commit 91604d6

Browse files
authored
Merge branch 'main' into zarr-dtype-tests
2 parents b1c6809 + c43a374 commit 91604d6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1299
-378
lines changed

.github/workflows/benchmarks.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ jobs:
1515
runs-on: ubuntu-latest
1616
env:
1717
ASV_DIR: "./asv_bench"
18-
CONDA_ENV_FILE: ci/requirements/environment.yml
18+
CONDA_ENV_FILE: ci/requirements/environment-benchmark.yml
1919

2020
steps:
2121
# We need the full repo to avoid this issue
@@ -29,7 +29,7 @@ jobs:
2929
with:
3030
micromamba-version: "1.5.10-0"
3131
environment-file: ${{env.CONDA_ENV_FILE}}
32-
environment-name: xarray-tests
32+
environment-name: xarray-benchmark
3333
cache-environment: true
3434
cache-environment-key: "${{runner.os}}-${{runner.arch}}-py${{env.PYTHON_VERSION}}-${{env.TODAY}}-${{hashFiles(env.CONDA_ENV_FILE)}}-benchmark"
3535
# add "build" because of https://github.com/airspeed-velocity/asv/issues/1385

.pre-commit-config.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,24 +24,24 @@ repos:
2424
- id: rst-inline-touching-normal
2525
- id: text-unicode-replacement-char
2626
- repo: https://github.com/astral-sh/ruff-pre-commit
27-
rev: v0.12.1
27+
rev: v0.12.2
2828
hooks:
29-
- id: ruff-format
30-
- id: ruff
29+
- id: ruff-check
3130
args: ["--fix", "--show-fixes"]
31+
- id: ruff-format
3232
- repo: https://github.com/keewis/blackdoc
33-
rev: v0.3.9
33+
rev: v0.4.1
3434
hooks:
3535
- id: blackdoc
3636
exclude: "generate_aggregations.py"
3737
additional_dependencies: ["black==24.8.0"]
3838
- repo: https://github.com/rbubley/mirrors-prettier
39-
rev: v3.5.3
39+
rev: v3.6.2
4040
hooks:
4141
- id: prettier
4242
args: [--cache-location=.prettier_cache/cache]
4343
- repo: https://github.com/pre-commit/mirrors-mypy
44-
rev: v1.16.0
44+
rev: v1.16.1
4545
hooks:
4646
- id: mypy
4747
# Copied from setup.cfg

HOW_TO_RELEASE.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ upstream https://github.com/pydata/xarray (push)
5252
5353
6. After merging, again ensure your main branch is synced to upstream:
5454
```sh
55+
git switch main
5556
git pull upstream main
5657
```
5758
7. If you have any doubts, run the full test suite one final time!
@@ -98,17 +99,17 @@ upstream https://github.com/pydata/xarray (push)
9899
99100
```
100101

101-
12. Commit your changes and push to main again:
102+
12. Make a PR with these changes and merge it:
102103

103104
```sh
104-
git commit -am 'New whatsnew section'
105-
git push upstream main
105+
git checkout -b empty-whatsnew-YYYY.MM.X+1
106+
git commit -am "empty whatsnew"
107+
git push
106108
```
107109

108-
You're done pushing to main!
110+
(Note that repo branch restrictions prevent pushing to `main`, so you have to just-self-merge this.)
109111

110112
13. Update the version available on pyodide:
111-
112113
- Open the PyPI page for [Xarray downloads](https://pypi.org/project/xarray/#files)
113114
- Edit [`pyodide/packages/xarray/meta.yaml`](https://github.com/pyodide/pyodide/blob/main/packages/xarray/meta.yaml) to update the
114115
- version number
@@ -119,7 +120,6 @@ upstream https://github.com/pydata/xarray (push)
119120
14. Issue the release announcement to mailing lists & Twitter (X). For bug fix releases, I
120121
usually only email xarray@googlegroups.com. For major/feature releases, I will email a broader
121122
list (no more than once every 3-6 months):
122-
123123
- pydata@googlegroups.com
124124
- xarray@googlegroups.com
125125
- numpy-discussion@scipy.org

asv_bench/asv.conf.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@
6060
// },
6161
"matrix": {
6262
"setuptools_scm": [""], // GH6609
63-
"numpy": [""],
63+
"numpy": ["2.2"],
6464
"pandas": [""],
6565
"netcdf4": [""],
6666
"scipy": [""],

asv_bench/benchmarks/README_CI.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,8 +115,10 @@ To minimize the time required to run the full suite, we trimmed the parameter ma
115115
```python
116116
from . import _skip_slow # this function is defined in benchmarks.__init__
117117

118+
118119
def time_something_slow():
119120
pass
120121

122+
121123
time_something.setup = _skip_slow
122124
```

asv_bench/benchmarks/repr.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,31 @@ def time_repr(self):
5757

5858
def time_repr_html(self):
5959
self.da._repr_html_()
60+
61+
62+
class ReprPandasRangeIndex:
63+
# display a memory-saving pandas.RangeIndex shouldn't trigger memory
64+
# expensive conversion into a numpy array
65+
def setup(self):
66+
index = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000), "x")
67+
self.ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index))
68+
69+
def time_repr(self):
70+
repr(self.ds.x)
71+
72+
def time_repr_html(self):
73+
self.ds.x._repr_html_()
74+
75+
76+
class ReprXarrayRangeIndex:
77+
# display an Xarray RangeIndex shouldn't trigger memory expensive conversion
78+
# of its lazy coordinate into a numpy array
79+
def setup(self):
80+
index = xr.indexes.RangeIndex.arange(1_000_000, dim="x")
81+
self.ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index))
82+
83+
def time_repr(self):
84+
repr(self.ds.x)
85+
86+
def time_repr_html(self):
87+
self.ds.x._repr_html_()
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
name: xarray-benchmark
2+
channels:
3+
- conda-forge
4+
- nodefaults
5+
dependencies:
6+
- bottleneck
7+
- cftime
8+
- dask-core
9+
- distributed
10+
- flox
11+
- netcdf4
12+
- numba
13+
- numbagg
14+
- numexpr
15+
- numpy>=2.2,<2.3 # https://github.com/numba/numba/issues/10105
16+
- opt_einsum
17+
- packaging
18+
- pandas
19+
- pyarrow # pandas raises a deprecation warning without this, breaking doctests
20+
- sparse
21+
- scipy
22+
- toolz
23+
- zarr

design_notes/flexible_indexes_notes.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -97,12 +97,12 @@ The new `indexes` argument of Dataset/DataArray constructors may be used to spec
9797
```python
9898
>>> da = xr.DataArray(
9999
... data=[[275.2, 273.5], [270.8, 278.6]],
100-
... dims=('x', 'y'),
100+
... dims=("x", "y"),
101101
... coords={
102-
... 'lat': (('x', 'y'), [[45.6, 46.5], [50.2, 51.6]]),
103-
... 'lon': (('x', 'y'), [[5.7, 10.5], [6.2, 12.8]]),
102+
... "lat": (("x", "y"), [[45.6, 46.5], [50.2, 51.6]]),
103+
... "lon": (("x", "y"), [[5.7, 10.5], [6.2, 12.8]]),
104104
... },
105-
... indexes={('lat', 'lon'): SpatialIndex},
105+
... indexes={("lat", "lon"): SpatialIndex},
106106
... )
107107
<xarray.DataArray (x: 2, y: 2)>
108108
array([[275.2, 273.5],
@@ -120,7 +120,7 @@ More formally, `indexes` would accept `Mapping[CoordinateNames, IndexSpec]` wher
120120
Currently index objects like `pandas.MultiIndex` can be passed directly to `coords`, which in this specific case results in the implicit creation of virtual coordinates. With the new `indexes` argument this behavior may become even more confusing than it currently is. For the sake of clarity, it would be appropriate to eventually drop support for this specific behavior and treat any given mapping value given in `coords` as an array that can be wrapped into an Xarray variable, i.e., in the case of a multi-index:
121121

122122
```python
123-
>>> xr.DataArray([1.0, 2.0], dims='x', coords={'x': midx})
123+
>>> xr.DataArray([1.0, 2.0], dims="x", coords={"x": midx})
124124
<xarray.DataArray (x: 2)>
125125
array([1., 2.])
126126
Coordinates:
@@ -169,8 +169,8 @@ Like for the indexes, explicit coordinate creation should be preferred over impl
169169
For example, it is currently possible to pass a `pandas.MultiIndex` object as a coordinate to the Dataset/DataArray constructor:
170170

171171
```python
172-
>>> midx = pd.MultiIndex.from_arrays([['a', 'b'], [0, 1]], names=['lvl1', 'lvl2'])
173-
>>> da = xr.DataArray([1.0, 2.0], dims='x', coords={'x': midx})
172+
>>> midx = pd.MultiIndex.from_arrays([["a", "b"], [0, 1]], names=["lvl1", "lvl2"])
173+
>>> da = xr.DataArray([1.0, 2.0], dims="x", coords={"x": midx})
174174
>>> da
175175
<xarray.DataArray (x: 2)>
176176
array([1., 2.])
@@ -201,7 +201,9 @@ Besides `pandas.MultiIndex`, there may be other situations where we would like t
201201
The example given here is quite confusing, though: this is not an easily predictable behavior. We could entirely avoid the implicit creation of coordinates, e.g., using a helper function that generates coordinate + index dictionaries that we could then pass directly to the DataArray/Dataset constructor:
202202

203203
```python
204-
>>> coords_dict, index_dict = create_coords_from_index(midx, dims='x', include_dim_coord=True)
204+
>>> coords_dict, index_dict = create_coords_from_index(
205+
... midx, dims="x", include_dim_coord=True
206+
... )
205207
>>> coords_dict
206208
{'x': <xarray.Variable (x: 2)>
207209
array([('a', 0), ('b', 1)], dtype=object),
@@ -211,7 +213,7 @@ The example given here is quite confusing, though: this is not an easily predict
211213
array([0, 1])}
212214
>>> index_dict
213215
{('lvl1', 'lvl2'): midx}
214-
>>> xr.DataArray([1.0, 2.0], dims='x', coords=coords_dict, indexes=index_dict)
216+
>>> xr.DataArray([1.0, 2.0], dims="x", coords=coords_dict, indexes=index_dict)
215217
<xarray.DataArray (x: 2)>
216218
array([1., 2.])
217219
Coordinates:

design_notes/grouper_objects.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
I propose the addition of Grouper objects to Xarray's public API so that
99

1010
```python
11-
Dataset.groupby(x=BinGrouper(bins=np.arange(10, 2))))
11+
Dataset.groupby(x=BinGrouper(bins=np.arange(10, 2)))
1212
```
1313

1414
is identical to today's syntax:
@@ -27,7 +27,7 @@ results = []
2727
for element in unique_labels:
2828
subset = ds.sel(x=(ds.x == element)) # split
2929
# subset = ds.where(ds.x == element, drop=True) # alternative
30-
result = subset.mean() # apply
30+
result = subset.mean() # apply
3131
results.append(result)
3232

3333
xr.concat(results) # combine
@@ -36,7 +36,7 @@ xr.concat(results) # combine
3636
to
3737

3838
```python
39-
ds.groupby('x').mean() # splits, applies, and combines
39+
ds.groupby("x").mean() # splits, applies, and combines
4040
```
4141

4242
Efficient vectorized implementations of this pattern are implemented in numpy's [`ufunc.at`](https://numpy.org/doc/stable/reference/generated/numpy.ufunc.at.html), [`ufunc.reduceat`](https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html), [`numbagg.grouped`](https://github.com/numbagg/numbagg/blob/main/numbagg/grouped.py), [`numpy_groupies`](https://github.com/ml31415/numpy-groupies), and probably more.
@@ -110,11 +110,13 @@ All Grouper objects will subclass from a Grouper object
110110
```python
111111
import abc
112112

113+
113114
class Grouper(abc.ABC):
114115
@abc.abstractmethod
115116
def factorize(self, by: DataArray):
116117
raise NotImplementedError
117118

119+
118120
class CustomGrouper(Grouper):
119121
def factorize(self, by: DataArray):
120122
...

0 commit comments

Comments
 (0)