Skip to content

Conversation

@mabruzzo
Copy link
Contributor

@mabruzzo mabruzzo commented Jun 13, 2025

PR Summary

The current implementation of create_profile computes extrema when override_bins is provided. This can significantly increase the cost of an operation when working with large datasets. This PR addresses that scenario.

The PR also refactors argument handling of create_profile to be more explicit for extrema and override_bins. The details of these arguments have a number of undocumented idiosyncracies. While the code is more verbose, I think it makes the requirements a little more explicit (and the error messages are more detailed). In the future, I think we can improve this a lot more, but this seems like a reasonable first-step.

It may be easiest to review this PR commit-by-commit.

PR Checklist

The PR Checklist isn't applicable since I haven't changed any user-facing behavior

@mabruzzo mabruzzo marked this pull request as draft June 27, 2025 02:27
@mabruzzo mabruzzo changed the title Enhancement: create_profile no longer computes extrema when we provide override_bins Enhancement: WIP: create_profile no longer computes extrema when we provide override_bins Jun 27, 2025
@mabruzzo mabruzzo force-pushed the improve-create_profile branch from 588a53a to e72c514 Compare June 27, 2025 16:31
Comment on lines +1354 to +1364
if extrema is None or not any(collapse(extrema.values())):
# when extrema are all Nones, we treat them as though extrema was set as None.
#
# In the future, it may be better to replace `not any(...)` with either:
# `all(v is None for v in collapse(extrema.values()))` OR
# `all(v is None or v == (None, None) for v in collapse(extrema.values))`
#
# Be aware that doing this changes behavior, when extrema has the value:
# `{<bin_field>: (0., None)}` or `{<bin_field>: (None, 0.)}`
# I happen to think think that the existing behavior is a bug (i.e. the
# behavior changes if we replaced 0. with **ANY** other number)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: This is mostly for posterity. I don't think we should deal with this issue in this PR (since this PR is focused on not changing observable behavior). I plan to make an issue for this after the PR is reviewed


I think we should probably change the if-statement to:

    if extrema is None or all(v is None for v in collapse(extrema.values())):

OR

    if extrema is None or all(list(v) == [None, None] for v in extrema.values()):

As the comments mention, both choices technically change the behavior when one of the extrema is a 0 and the other is a None, but I think the existing behavior is probably a bug. The former option is more backwards compatible while the latter option is more consistent with the rest of our extrema handling.

For more context, the following table whether various choices for the extrema argument are considered valid, when bin_fields is [("gas", "velocity_x"), ("gas", "density")]

Current Logic Snippet 1 Snippet 2
{("gas","velocity_x"): (0.0, None)} ️✅*
{("gas","velocity_x"): None, ("gas", "density"): None} ️✅ ️✅
{("gas","velocity_x"): (None, None), ("gas", "density"): (None, None)} ️✅ ️✅ ️✅
{("gas","velocity_x"): (0.0, None), ("gas", "density"): (None, None)} ️✅*
{("gas","velocity_x"): None, ("gas", "density"): (1e-27, 1e-23)}
{("gas","velocity_x"): (None,None), ("gas", "density"): (1e-27, 1e-23)} ️✅ ️✅ ️✅

Importantly: The starred cases treat are scenarios that the snippets would fix. In those cases, the existing logic totally ignores the fact that the caller specifies that the lower extrema for ("gas","velocity") is 0.0 and acts like it was set to None. Furthermore, the existing logic causes the function to behave completely differently for the following cases:

  • {("gas","velocity_x"): (-1.0, None)} (this is considered to be invalid)
  • {("gas","velocity_x"): (1.0, None), ("gas", "density"): (None, None)}
  • {("gas","velocity_x"): (0.0, None), ("gas", "density"): (None, 1e-20)}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant