Skip to content

feat(sqlite): Improve throughput of SqliteEventCacheStore::load_all_chunks_metadata by 1140% #5411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Hywan
Copy link
Member

@Hywan Hywan commented Jul 15, 2025

This patch changes the query used by SqliteEventCacheStore::load_all_chunks_metadata. It was the cause of severe slowness (see #5407). The new query improves the throughput by +1140% and the time by -91.916%.

This query will visit all chunks of a linked chunk with ID hashed_linked_chunk_id. For each chunk, it collects its ID (ChunkIdentifier), previous chunk, next chunk, and number of events (num_events). If it's a gap, num_events is equal to 0, otherwise it counts the number of events in event_chunks where event_chunks.chunk_id = linked_chunks.id.

Why not using a (LEFT) JOIN + COUNT? Because for gaps, the entire event_chunks will be traversed every time. It's extremely inefficient. To speed that up, we could use an INDEX but it will consume more storage space. Finally, traversing an INDEX boils down to traversing a B-tree, which is O(log n), whilst this CASE approach is O(1). This solution is a nice trade-off and offers great performance.

Note

Please, note that the more gaps, the slower the query was. Now, the number of gaps isn't impactful.

This change doesn't involve a new schema, no migration is necessary.

Metrics for 10'000 events (with 1 gap every 80 events)

Before

Screenshot 2025-07-15 at 15 41 22
Lower bound Estimate Upper bound
Throughput 20.650 Kelem/s 20.686 Kelem/s 20.722 Kelem/s
0.1032688 0.1374475 0.1035309
Mean 482.58 ms 483.43 ms 484.27 ms
Std. Dev. 929.07 µs 1.4376 ms 1.7083 ms
Median 481.90 ms 483.82 ms 484.60 ms
MAD 173.82 µs 2.1061 ms 2.4203 ms

After

Screenshot 2025-07-15 at 15 41 14
Lower bound Estimate Upper bound
Slope 39.322 ms 39.444 ms 39.607 ms
Throughput 252.48 Kelem/s 253.52 Kelem/s 254.31 Kelem/s
0.9993976 0.9995784 0.9992583
Mean 39.381 ms 39.478 ms 39.596 ms
Std. Dev. 75.457 µs 184.96 µs 260.35 µs
Median 39.354 ms 39.459 ms 39.546 ms
MAD 23.761 µs 124.48 µs 267.83 µs

Hywan added 2 commits July 15, 2025 15:35
This patch changes the query used by
`SqliteEventCacheStore::load_all_chunks_metadata`. It was the cause of
severe slowness. The new query improves the throughput by +1140% and the
time by -91.916%. The benchmark will follow in the next patch.

Metrics for 10'000 events (with 1 gap every 80 events).

- Before:
  - throughput: 20.686 Kelem/s,
  - time: 483.43 ms.
- After:
  - throughput: 253.52 Kelem/s,
  - time: 39.478 ms.

This query will visit all chunks of a linked chunk with ID
`hashed_linked_chunk_id`. For each chunk, it collects its ID
(`ChunkIdentifier`), previous chunk, next chunk, and number of
events (`num_events`). If it's a gap, `num_events` is equal to 0,
otherwise it counts the number of events in `event_chunks` where
`event_chunks.chunk_id = linked_chunks.id`.

Why not using a `(LEFT) JOIN` + `COUNT`? Because for gaps, the entire
`event_chunks` will be traversed every time. It's extremely inefficient.
To speed that up, we could use an `INDEX` but it will consume more
storage space. Finally, traversing an `INDEX` boils down to traverse a
B-tree, which is O(log n), whilst this `CASE` approach is O(1). This
solution is nice trade-off and offers great performance.
@Hywan Hywan changed the title feat(sqlite): Improve throughput of load_all_chunks_metadata by 1140% feat(sqlite): Improve throughput of SqliteEventCacheStore::load_all_chunks_metadata by 1140% Jul 15, 2025
@Hywan Hywan marked this pull request as ready for review July 15, 2025 14:01
@Hywan Hywan requested a review from a team as a code owner July 15, 2025 14:01
@Hywan Hywan requested review from andybalaam and poljar and removed request for a team and andybalaam July 15, 2025 14:01
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.81%. Comparing base (7d9d5bf) to head (9e2da60).
Report is 5 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5411   +/-   ##
=======================================
  Coverage   88.80%   88.81%           
=======================================
  Files         334      334           
  Lines       91256    91266   +10     
  Branches    91256    91266   +10     
=======================================
+ Hits        81044    81059   +15     
+ Misses       6399     6393    -6     
- Partials     3813     3814    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant