Benchmarks for timeseries `aggregate_downsample()` #135

orionlee · 2025-06-02T01:45:21Z

Would have caught the performance regression in v7.1.0 ( astropy/astropy#18176 ), being fixed by PR astropy/astropy#18188 .

The benchmark results on my machine

`aggregate_func`	v7.1.0	with astropy PR #18188
`None`	9.49±0.2ms	9.79±0.3ms
`np.nanmean`	186±20ms	119±3ms

@mvhk @pllim

pllim · 2025-06-02T14:02:59Z

benchmarks/timeseries.py

+
+    # FIXME: it hits the known issue in astropy/utils/masked/core.py
+    #   NotImplementedError: masked instances cannot yet deal with 'reduceat' or 'at'.
+    #   relevant PR: https://github.com/astropy/astropy/pull/17875


Is there an open issue we can link this to? Linking to a closed astropy/astropy#17875 won't help with visibility. Thanks!

I don't know if there is an issue explicitly tracking the problem, though I suspect astropy/astropy#17875 was prematurely closed: @mvhk mentioned he'd like to have the issue addressed by the PR resolved post v7.1.0.

I added it to the list at astropy/astropy#11539 so that we don't forget.

My sense here is to actually remove the MaskedQuantity from the table, i.e., make two tables in setup and then allow passing on a table to _do_time_aggretate....

Could you clarify on what you mean? Do you mean splitting up the current the test table, with a mix of plain Column, MaskedColumn, Quantity, MaskedQuantity columns, to 4 test tables (one for each column type), thus making the test more fine-grained?

What I had in mind was simpler, to make a self.ts2 that omits the failing column and using that for this test. Possibly, yours is the better idea, though, so do what you think is best! Really, all I hoped was to actually do some benchmark also with np.add.reduceat.

mhvk

Thanks, looks good! Some very small comments.

mhvk · 2025-06-02T15:48:54Z

benchmarks/timeseries.py

+        ts["a"] = np.random.random(num_samples)  # plain Column
+
+        ts["a_mc"] = MaskedColumn(ts["a"].value, mask=False)
+        ts["a_mc"][1] = True  # just to ensure some value is masked


Shouldn't this be set to np.ma.masked to ensure a mask? RIght now, I think it just sets the value to 1...

My mistake. I do mean to have the value masked instead of just setting the value.

mhvk · 2025-06-02T15:49:03Z

benchmarks/timeseries.py

+        ts["a_q"] = ts["a"] * u.dimensionless_unscaled  # Quantity
+
+        ts["a_mq"] = Masked(ts["a_q"])  # MaskedQuantity
+        ts["a_mq"][1] = True  # just to ensure some value is masked


mhvk · 2025-06-02T15:56:02Z

benchmarks/timeseries.py

+
+    # FIXME: it hits the known issue in astropy/utils/masked/core.py
+    #   NotImplementedError: masked instances cannot yet deal with 'reduceat' or 'at'.
+    #   relevant PR: https://github.com/astropy/astropy/pull/17875


I added it to the list at astropy/astropy#11539 so that we don't forget.

My sense here is to actually remove the MaskedQuantity from the table, i.e., make two tables in setup and then allow passing on a table to _do_time_aggretate....

… each column type - aggregate_func combination

orionlee · 2025-06-03T01:30:49Z

The issues raised by mvhk have been resolved.

The test is now more granular, one for each column type - aggregate_func combination

With PR 18188:
              --                           aggregate_func
              ---------- --------------------------------------------------
               col_type       None      <function nanmean>   <ufunc 'add'>
              ========== ============= ==================== ===============
                 col       7.21±0.3ms       24.5±0.4ms          7.03±4ms
                 mcol     7.28±0.08ms        69.4±3ms          7.32±0.6ms
                 qty       6.97±0.6ms       11.8±0.3ms        6.89±0.04ms
                 mqty     7.35±0.07ms       28.2±0.2ms          n/a
              ========== ============= ==================== ===============

Before the PR (v7.1.0-like)

              --                           aggregate_func
              ---------- -------------------------------------------------
               col_type      None      <function nanmean>   <ufunc 'add'>
              ========== ============ ==================== ===============
                 col      7.00±0.2ms       24.6±0.4ms         6.95±0.1ms
                 mcol      7.23±2ms        69.8±10ms          7.31±0.1ms
                 qty      6.99±0.1ms        31.5±8ms          6.89±0.1ms
                 mqty     7.25±0.1ms        59.3±9ms           n/a
              ========== ============ ==================== ===============

It shows astropy PR 18188 addressed the performance regression in v7.1.0 for np.nanmean against Quantity / MaskedQuantity.
- the case np.nanmean against MaskedQuantity is still comparatively slow after the PR 18188, but I don't think it is an regression; and has to do with the performance of astropy.utils.masked.core.MaskedNDArray in repeatedly creating the small chunks of values to be aggregated.
The test also makes it explicit that for cases np.nanmean against Column / MaskedColumn, the performance is not good either. But I think it's an existing issue that has never been raised.

mhvk

Super, this looks great!

pllim · 2025-06-03T15:42:43Z

Thanks all!

Benchmarks for timeseries aggregate_downsample

78d6186

orionlee force-pushed the timeseries_aggregate_downsample_benchmarks branch from d3d6a68 to 78d6186 Compare June 2, 2025 02:03

pllim reviewed Jun 2, 2025

View reviewed changes

pllim requested a review from mhvk June 2, 2025 14:03

mhvk reviewed Jun 2, 2025

View reviewed changes

pllim mentioned this pull request Jun 2, 2025

Meta issue for Masked class follow-up astropy/astropy#11539

Open

8 tasks

orionlee added 2 commits June 2, 2025 17:25

bug fix on test data: ensure some values are masked, and some are nan

d686ac7

add np.add as aggregate_func to test; make tests more granular, 1 for…

326d4b7

… each column type - aggregate_func combination

orionlee force-pushed the timeseries_aggregate_downsample_benchmarks branch from 0a96b67 to 326d4b7 Compare June 3, 2025 01:24

explciitly skip the case MaskedQuantity with np.add (due to known issue)

3220cec

mhvk approved these changes Jun 3, 2025

View reviewed changes

pllim merged commit d8480a1 into astropy:main Jun 3, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Benchmarks for timeseries `aggregate_downsample()` #135

Benchmarks for timeseries `aggregate_downsample()` #135

Uh oh!

orionlee commented Jun 2, 2025 •

edited

Loading

Uh oh!

pllim Jun 2, 2025

Uh oh!

orionlee Jun 2, 2025

Uh oh!

mhvk Jun 2, 2025

Uh oh!

orionlee Jun 2, 2025

Uh oh!

mhvk Jun 2, 2025

Uh oh!

mhvk left a comment

Uh oh!

mhvk Jun 2, 2025

Uh oh!

orionlee Jun 2, 2025

Uh oh!

mhvk Jun 2, 2025

Uh oh!

mhvk Jun 2, 2025

Uh oh!

orionlee commented Jun 3, 2025 •

edited

Loading

Uh oh!

mhvk left a comment

Uh oh!

Uh oh!

pllim commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Benchmarks for timeseries aggregate_downsample() #135

Benchmarks for timeseries aggregate_downsample() #135

Uh oh!

Conversation

orionlee commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orionlee commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pllim commented Jun 3, 2025

Uh oh!

Uh oh!

Benchmarks for timeseries `aggregate_downsample()` #135

Benchmarks for timeseries `aggregate_downsample()` #135

orionlee commented Jun 2, 2025 •

edited

Loading

orionlee commented Jun 3, 2025 •

edited

Loading