feat(sqlite): Improve throughput of `SqliteEventCacheStore::load_all_chunks_metadata` by 1140% #5411

Hywan · 2025-07-15T13:44:17Z

This patch changes the query used by SqliteEventCacheStore::load_all_chunks_metadata. It was the cause of severe slowness (see #5407). The new query improves the throughput by +1140% and the time by -91.916%.

This query will visit all chunks of a linked chunk with ID hashed_linked_chunk_id. For each chunk, it collects its ID (ChunkIdentifier), previous chunk, next chunk, and number of events (num_events). If it's a gap, num_events is equal to 0, otherwise it counts the number of events in event_chunks where event_chunks.chunk_id = linked_chunks.id.

Why not using a (LEFT) JOIN + COUNT? Because for gaps, the entire event_chunks will be traversed every time. It's extremely inefficient. To speed that up, we could use an INDEX but it will consume more storage space. Finally, traversing an INDEX boils down to traversing a B-tree, which is O(log n), whilst this CASE approach is O(1). This solution is a nice trade-off and offers great performance.

Note

Please, note that the more gaps, the slower the query was. Now, the number of gaps isn't impactful.

This change doesn't involve a new schema, no migration is necessary.

Metrics for 10'000 events (with 1 gap every 80 events)

Before

	Lower bound	Estimate	Upper bound
Throughput	20.650 Kelem/s	20.686 Kelem/s	20.722 Kelem/s
R²	0.1032688	0.1374475	0.1035309
Mean	482.58 ms	483.43 ms	484.27 ms
Std. Dev.	929.07 µs	1.4376 ms	1.7083 ms
Median	481.90 ms	483.82 ms	484.60 ms
MAD	173.82 µs	2.1061 ms	2.4203 ms

After

	Lower bound	Estimate	Upper bound
Slope	39.322 ms	39.444 ms	39.607 ms
Throughput	252.48 Kelem/s	253.52 Kelem/s	254.31 Kelem/s
R²	0.9993976	0.9995784	0.9992583
Mean	39.381 ms	39.478 ms	39.596 ms
Std. Dev.	75.457 µs	184.96 µs	260.35 µs
Median	39.354 ms	39.459 ms	39.546 ms
MAD	23.761 µs	124.48 µs	267.83 µs

This patch changes the query used by `SqliteEventCacheStore::load_all_chunks_metadata`. It was the cause of severe slowness. The new query improves the throughput by +1140% and the time by -91.916%. The benchmark will follow in the next patch. Metrics for 10'000 events (with 1 gap every 80 events). - Before: - throughput: 20.686 Kelem/s, - time: 483.43 ms. - After: - throughput: 253.52 Kelem/s, - time: 39.478 ms. This query will visit all chunks of a linked chunk with ID `hashed_linked_chunk_id`. For each chunk, it collects its ID (`ChunkIdentifier`), previous chunk, next chunk, and number of events (`num_events`). If it's a gap, `num_events` is equal to 0, otherwise it counts the number of events in `event_chunks` where `event_chunks.chunk_id = linked_chunks.id`. Why not using a `(LEFT) JOIN` + `COUNT`? Because for gaps, the entire `event_chunks` will be traversed every time. It's extremely inefficient. To speed that up, we could use an `INDEX` but it will consume more storage space. Finally, traversing an `INDEX` boils down to traverse a B-tree, which is O(log n), whilst this `CASE` approach is O(1). This solution is nice trade-off and offers great performance.

codecov · 2025-07-15T14:01:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.81%. Comparing base (7d9d5bf) to head (9e2da60).
Report is 5 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #5411   +/-   ##
=======================================
  Coverage   88.80%   88.81%           
=======================================
  Files         334      334           
  Lines       91256    91266   +10     
  Branches    91256    91266   +10     
=======================================
+ Hits        81044    81059   +15     
+ Misses       6399     6393    -6     
- Partials     3813     3814    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Hywan added 2 commits July 15, 2025 15:35

bench: Add a benchmark for EventCacheStore::load_all_chunks_metadata.

9e2da60

Hywan changed the title ~~feat(sqlite): Improve throughput of load_all_chunks_metadata by 1140%~~ feat(sqlite): Improve throughput of SqliteEventCacheStore::load_all_chunks_metadata by 1140% Jul 15, 2025

Hywan mentioned this pull request Jul 15, 2025

Revert "fix(sdk): Disable OrderTracker for the moment." #5412

Open

Hywan marked this pull request as ready for review July 15, 2025 14:01

Hywan requested a review from a team as a code owner July 15, 2025 14:01

Hywan requested review from andybalaam and poljar and removed request for a team and andybalaam July 15, 2025 14:01

Hywan mentioned this pull request Jul 15, 2025

fix(sdk): Disable OrderTracker for the moment #5407

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(sqlite): Improve throughput of `SqliteEventCacheStore::load_all_chunks_metadata` by 1140% #5411

feat(sqlite): Improve throughput of `SqliteEventCacheStore::load_all_chunks_metadata` by 1140% #5411

Hywan commented Jul 15, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat(sqlite): Improve throughput of SqliteEventCacheStore::load_all_chunks_metadata by 1140% #5411

Are you sure you want to change the base?

feat(sqlite): Improve throughput of SqliteEventCacheStore::load_all_chunks_metadata by 1140% #5411

Conversation

Hywan commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Metrics for 10'000 events (with 1 gap every 80 events)

Before

After

Uh oh!

codecov bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

feat(sqlite): Improve throughput of `SqliteEventCacheStore::load_all_chunks_metadata` by 1140% #5411

feat(sqlite): Improve throughput of `SqliteEventCacheStore::load_all_chunks_metadata` by 1140% #5411

Hywan commented Jul 15, 2025 •

edited

Loading

codecov bot commented Jul 15, 2025 •

edited

Loading