Skip to content

Conversation

@neutrinoceros
Copy link
Member

PR Summary

I noticed this function could be heavily simplified, but also, and perhaps more importantly, that its return type would sometimes be a copy of the input data, but not always. This looks like a footgun so I'd rather make it always return a copy, so as to avoid subtle bugs.

PR Checklist

  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

@neutrinoceros neutrinoceros added this to the 4.5.0 milestone Oct 19, 2025
@neutrinoceros neutrinoceros added enhancement Making something better api-consistency naming conventions, code deduplication, informative error messages, code smells... labels Oct 19, 2025
@matthewturk
Copy link
Member

Wait, hold up. I'm not sure that we want this behavior. Making copies of every array that goes into cython is potentially going to lead to an enormous amount of memory churn and reallocation.

@neutrinoceros neutrinoceros changed the title ENH: ensure yt.utils.funcs.ensure_numpy_array consistently returns a copy ENH: ensure yt.utils.funcs.ensure_numpy_array consistently returns a copy Oct 19, 2025
@neutrinoceros
Copy link
Member Author

to be clear, the only inputs where the existing code doesn't copy already are

  • a list/tuple of non-numeric Python objects (say, strs), in which case it'll return an array of pointers (dtype=object)
  • scalars items (either non-arrays or single-value arrays)

@matthewturk
Copy link
Member

dang, that's brutal. One of the biggest costs of ghost zones, last time I looked, was around copying and recopying arrays and checking that they were in the right units. But I suppose this probably isn't in that category.

@neutrinoceros
Copy link
Member Author

so if we don't want this to copy, that's a change in behavior in most cases, and I don't think it's one that's safe to make (people's code is going to break in subtle ways).
I'd suggest introducing some new API (either a brand new function, or adding a copy kwarg to this one) if there are applications where we definitely don't want copies.

@matthewturk
Copy link
Member

nah it's fine, you pointed out that I was interpreting the change incorrectly.

@neutrinoceros
Copy link
Member Author

neutrinoceros commented Oct 19, 2025

CI revealed that np.asarray didn't have a copy argument before numpy 2.0. I could work around this at a local level, but incidentally, I'm adding a general mechanism for handling this problem gracefully across the board in #5307, so I'd prefer to re-work this PR once #5307 goes in.

@neutrinoceros neutrinoceros force-pushed the enh/ensure_numpy_array_always_copy branch from b307d6b to 688469b Compare October 19, 2025 19:50
@neutrinoceros
Copy link
Member Author

Actually it was easy enough to bring only the backward-compatibility layer from the other PR, so I plan to undraft this after the on-going round of CI.

@neutrinoceros neutrinoceros force-pushed the enh/ensure_numpy_array_always_copy branch from 688469b to ade7357 Compare October 19, 2025 19:53
@neutrinoceros neutrinoceros force-pushed the enh/ensure_numpy_array_always_copy branch from ade7357 to f0edd1a Compare October 19, 2025 19:53
@neutrinoceros
Copy link
Member Author

even better; I can actually write this in a way that always copy and doesn't need a backward compatibility layer at all.

@neutrinoceros neutrinoceros marked this pull request as ready for review October 19, 2025 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-consistency naming conventions, code deduplication, informative error messages, code smells... enhancement Making something better

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants