Skip to content

Benchmarks for timeseries aggregate_downsample() #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

orionlee
Copy link
Contributor

@orionlee orionlee commented Jun 2, 2025

Would have caught the performance regression in v7.1.0 ( astropy/astropy#18176 ), being fixed by PR astropy/astropy#18188 .

The benchmark results on my machine

aggregate_func v7.1.0 with astropy PR #18188
None 9.49±0.2ms 9.79±0.3ms
np.nanmean 186±20ms 119±3ms

@mvhk @pllim

@orionlee orionlee force-pushed the timeseries_aggregate_downsample_benchmarks branch from d3d6a68 to 78d6186 Compare June 2, 2025 02:03

# FIXME: it hits the known issue in astropy/utils/masked/core.py
# NotImplementedError: masked instances cannot yet deal with 'reduceat' or 'at'.
# relevant PR: https://github.com/astropy/astropy/pull/17875
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an open issue we can link this to? Linking to a closed astropy/astropy#17875 won't help with visibility. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there is an issue explicitly tracking the problem, though I suspect astropy/astropy#17875 was prematurely closed: @mvhk mentioned he'd like to have the issue addressed by the PR resolved post v7.1.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to the list at astropy/astropy#11539 so that we don't forget.

My sense here is to actually remove the MaskedQuantity from the table, i.e., make two tables in setup and then allow passing on a table to _do_time_aggretate....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify on what you mean? Do you mean splitting up the current the test table, with a mix of plain Column, MaskedColumn, Quantity, MaskedQuantity columns, to 4 test tables (one for each column type), thus making the test more fine-grained?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had in mind was simpler, to make a self.ts2 that omits the failing column and using that for this test. Possibly, yours is the better idea, though, so do what you think is best! Really, all I hoped was to actually do some benchmark also with np.add.reduceat.

@pllim pllim requested a review from mhvk June 2, 2025 14:03
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good! Some very small comments.

ts["a"] = np.random.random(num_samples) # plain Column

ts["a_mc"] = MaskedColumn(ts["a"].value, mask=False)
ts["a_mc"][1] = True # just to ensure some value is masked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be set to np.ma.masked to ensure a mask? RIght now, I think it just sets the value to 1...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. I do mean to have the value masked instead of just setting the value.

ts["a_q"] = ts["a"] * u.dimensionless_unscaled # Quantity

ts["a_mq"] = Masked(ts["a_q"]) # MaskedQuantity
ts["a_mq"][1] = True # just to ensure some value is masked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above.


# FIXME: it hits the known issue in astropy/utils/masked/core.py
# NotImplementedError: masked instances cannot yet deal with 'reduceat' or 'at'.
# relevant PR: https://github.com/astropy/astropy/pull/17875
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to the list at astropy/astropy#11539 so that we don't forget.

My sense here is to actually remove the MaskedQuantity from the table, i.e., make two tables in setup and then allow passing on a table to _do_time_aggretate....

@orionlee orionlee force-pushed the timeseries_aggregate_downsample_benchmarks branch from 0a96b67 to 326d4b7 Compare June 3, 2025 01:24
@orionlee
Copy link
Contributor Author

orionlee commented Jun 3, 2025

The issues raised by mvhk have been resolved.

The test is now more granular, one for each column type - aggregate_func combination

With PR 18188:
              --                           aggregate_func
              ---------- --------------------------------------------------
               col_type       None      <function nanmean>   <ufunc 'add'>
              ========== ============= ==================== ===============
                 col       7.21±0.3ms       24.5±0.4ms          7.03±4ms
                 mcol     7.28±0.08ms        69.4±3ms          7.32±0.6ms
                 qty       6.97±0.6ms       11.8±0.3ms        6.89±0.04ms
                 mqty     7.35±0.07ms       28.2±0.2ms          n/a
              ========== ============= ==================== ===============

Before the PR (v7.1.0-like)

              --                           aggregate_func
              ---------- -------------------------------------------------
               col_type      None      <function nanmean>   <ufunc 'add'>
              ========== ============ ==================== ===============
                 col      7.00±0.2ms       24.6±0.4ms         6.95±0.1ms
                 mcol      7.23±2ms        69.8±10ms          7.31±0.1ms
                 qty      6.99±0.1ms        31.5±8ms          6.89±0.1ms
                 mqty     7.25±0.1ms        59.3±9ms           n/a
              ========== ============ ==================== ===============
  • It shows astropy PR 18188 addressed the performance regression in v7.1.0 for np.nanmean against Quantity / MaskedQuantity.
    • the case np.nanmean against MaskedQuantity is still comparatively slow after the PR 18188, but I don't think it is an regression; and has to do with the performance of astropy.utils.masked.core.MaskedNDArray in repeatedly creating the small chunks of values to be aggregated.
  • The test also makes it explicit that for cases np.nanmean against Column / MaskedColumn, the performance is not good either. But I think it's an existing issue that has never been raised.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super, this looks great!

@pllim pllim merged commit d8480a1 into astropy:main Jun 3, 2025
3 checks passed
@pllim
Copy link
Member

pllim commented Jun 3, 2025

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants