feat(sqlite): Improve throughput of SqliteEventCacheStore::load_all_chunks_metadata
by 1140%
#5411
+68
−42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch changes the query used by
SqliteEventCacheStore::load_all_chunks_metadata
. It was the cause of severe slowness (see #5407). The new query improves the throughput by +1140% and the time by -91.916%.This query will visit all chunks of a linked chunk with ID
hashed_linked_chunk_id
. For each chunk, it collects its ID (ChunkIdentifier
), previous chunk, next chunk, and number of events (num_events
). If it's a gap,num_events
is equal to 0, otherwise it counts the number of events inevent_chunks
whereevent_chunks.chunk_id = linked_chunks.id
.Why not using a
(LEFT) JOIN
+COUNT
? Because for gaps, the entireevent_chunks
will be traversed every time. It's extremely inefficient. To speed that up, we could use anINDEX
but it will consume more storage space. Finally, traversing anINDEX
boils down to traversing a B-tree, which is O(log n), whilst thisCASE
approach is O(1). This solution is a nice trade-off and offers great performance.Note
Please, note that the more gaps, the slower the query was. Now, the number of gaps isn't impactful.
This change doesn't involve a new schema, no migration is necessary.
Metrics for 10'000 events (with 1 gap every 80 events)
Before
After