-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Clean-up indexing adapter classes #10355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
dcherian
merged 27 commits into
pydata:main
from
benbovy:cleanup-pandas-indexing-adapter
Jul 7, 2025
Merged
Changes from 6 commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
bc94c6d
clean-up indexing.PandasIndexingAdapter typing
benbovy 17ff7e9
streamline PandasIndexingAdapter indexing logic
benbovy 2b25155
clean-up PandasIndexingAdapter dtype handling
benbovy 9981078
more clean-up
benbovy 29098ac
repr: prevent loading lazy variables into memory
benbovy 5f09354
fix array (index) subsetting
benbovy c4a853e
Merge branch 'main' into cleanup-pandas-indexing-adapter
benbovy 0e5154c
treat multi-index and coord-transform variables as lazy
benbovy 4efb135
update whats new
benbovy ef73a7e
add benchmarks for pandas and xarray RangeIndex
benbovy 28b661a
Merge branch 'main' into cleanup-pandas-indexing-adapter
benbovy a2ccb7d
fix benchmark numba import error (numpy 2.3)
benbovy 07f6cdb
benchmark: pin numpy in conf + consistent conda env
benbovy a953b41
pyproject: bump setuptools(-scm)
benbovy 2be275d
ci benchmarks: try fixing package install
benbovy 825cdb1
next try
benbovy c890a69
next try
benbovy 16fe98b
next try
benbovy 8ae12f7
benchmarks: try disabling no build isolation
benbovy f40f38c
Revert "benchmarks: try disabling no build isolation"
benbovy 0ecc214
Revert "next try"
benbovy 74e993c
Revert "next try"
benbovy 3420fc9
Revert "next try"
benbovy 97579f5
Revert "ci benchmarks: try fixing package install"
benbovy 86df720
Revert "pyproject: bump setuptools(-scm)"
benbovy 0887a8e
I'm tired of Python packaging
benbovy 8a76b46
Let's fix all this later
benbovy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we get away with adding
_repr_inline
to CoordinateTransfromIndexingAdapter? I think it's preferable to avoid this kind of special casing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does not format the inline repr but the data of a DataArray or variable.
We could add a
short_data_repr
method to those adapter classes and check ifinternal_data
has such a method here, although it is not much different from this special casing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a better request is to see whether we can just reuse
format_array_flat
which already does indexing and should just work for these classes.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm again
format_array_flat
is for formatting the inline repr, whereas the special case introduced here is for formatting the data repr.Without this special case,
short_data_repr()
will convert the indexing adapters into numpy arrays viaPandasIndexingAdpater.get_duck_array()
andCoordinateTransformIndexingAdapter.get_duck_array()
over their full shape / size. For both the inline and the data reprs, we want to select only the first and last relevant items before doing this conversion.first_n_items()
andlast_n_items()
inxarray.core.formatting
do similar things thanPandasIndexingAdapter._get_array_subset()
andCoordinateTransformIndexingAdapter._get_array_subset()
. We could perhaps reuse the two former instead, although for the data repr (at least for CoordinateTransform, and possibly later for pd.IntervalIndex) we don't want a flattened result. So this would involve more refactoring. Also this wouldn't remove the special case here.Alternatively we could tweak
Variable._in_memory
such that it returns False when._data
is a PandasIndexingAdapter (only when it wraps a pd.RangeIndex) or a CoordinateTransformIndexingAdapter, which will turn their data repr from, e.g.,into
On one hand I like seeing the actual first and last values of a (lazy) range index or coordinate transform. On the other hand I find a bit confusing that is it shown like a plain numpy array.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I ended up refactoring in 0e5154c, for PandasMultiIndex coordinates too.
I think it makes sense to show those coordinate variables as lazy variables rather than showing them as numpy arrays. Those aren't numpy arrays, coercing them like so may trigger a lot of computation and memory allocation.
A possible new source of confusion, however, is that they now reuse the same repr than for the other lazy variables loaded from disk. One difference is that the index coordinates will still be lazy even after "loading" a dataset, i.e.,
I'm not sure what would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could create another lazy array class that inherits from the current one but overrides
__repr__
to show that they're different (and if that later turns out to be wrong we don't have to go through a deprecation cycle, as all that machinery is internal anyways)