Skip to content

feat(sqlite): SqliteEventCacheStore has 1 write connection #5382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Hywan
Copy link
Member

@Hywan Hywan commented Jul 9, 2025

Until now, SqliteEventCacheStore manages a pool of connections. A connection is fetched from this pool and operations are executed on it, regardless whether these are read operations or write operations.

We are seeing more and more database is busy errors. We believe this is because too many write operations are executed concurrently.

The solution to solve this is to use multiple connections for read operations, and a single connection for write operations. That way, concurrent writings are no longer a thing, and we hope it will reduce the number of database is busy errors to zero. That's our guess.

This patch does that. When the pool of connections is created, a connection is elected as the write_connection. To get a connection for read operations, one has to use the new SqliteEventCacheStore::read method (it replaces the acquire method). To get a connection for write operations, one has to use the new SQliteEventCacheStore::write method. It returns a OwnedMutexGuard from an async Mutex. All callers that want to do write operations on this store have to wait their turn, this Mutex is fair, and the first to wait on the lock is the first that will take the lock (FIFO). It guarantees the execution ordering the code expects.

The rest of the patch updates all spots where acquire was used and replaces them by read() or write(). A particular care was made to see if other places are using SqliteEventCacheStore::pool directly. No place remains except in read() and write().

Finally, other commits are adding more logs and timer logs.


Hywan added 2 commits July 9, 2025 15:31
This patch updates `SqliteStoreConfig::pool_size` to be at least 2. We
need 2 connections: one for write operations, one for read operations.
This behaviour is coming in the next patches.
Until now, `SqliteEventCacheStore` manages a pool of connections. A
connection is fetched from this pool and operations are executed on it,
regardless whether these are read operations or write operations.

We are seeing more and more _database is busy_ errors. We believe this
is because too many write operations are executed concurrently.

The solution to solve this is to use multiple connections for read
operations, and a single connection for write operations. That way,
concurrent writings are no longer a thing, and we hope it will reduce
the number of _database is busy_ errors to zero. That's our guess.

This patch does that. When the pool of connections is created, a
connection is elected as the `write_connection`. To get a connection for
read operations, one has to use the new `SqliteEventCacheStore::read`
method (it replaces the `acquire` method). To get a connection for
write operations, one has to use the new `SQliteEventCacheStore::write`
method. It returns a `OwnedMutexGuard` from an async `Mutex`. All
callers that want to do write operations on this store have to wait
their turn, this `Mutex` is fair, and the first to wait on the lock is
the first that will take the lock (FIFO). It guarantees the execution
ordering the code expects.

The rest of the patch updates all spots where `acquire` was used and
replaces them by `read()` or `write()`. A particular care was made to
see if other places are using `SqliteEventCacheStore::pool` directly. No
place remains except in `read()` and `write()`.
@Hywan Hywan requested a review from a team as a code owner July 9, 2025 15:38
@Hywan Hywan requested review from poljar and removed request for a team July 9, 2025 15:38
@Hywan
Copy link
Member Author

Hywan commented Jul 9, 2025

Bonus: I'm thinking about it right now, but we may want to add logs around write() to know how much it took to acquire the write connection. It could be a very useful information to debug slowness if any. I'll do that later. Please don't merge this PR immediately.

Comment on lines +1621 to +1622
this.write()
.await?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change ensures with_immediate_transaction takes a write connection.

Copy link

codecov bot commented Jul 9, 2025

Codecov Report

Attention: Patch coverage is 78.20513% with 17 lines in your changes missing coverage. Please review.

Project coverage is 88.80%. Comparing base (72d1332) to head (fd73cb3).
Report is 37 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
crates/matrix-sdk-sqlite/src/event_cache_store.rs 75.71% 3 Missing and 14 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5382      +/-   ##
==========================================
- Coverage   88.81%   88.80%   -0.01%     
==========================================
  Files         334      334              
  Lines       91143    91153      +10     
  Branches    91143    91153      +10     
==========================================
  Hits        80948    80948              
- Misses       6369     6385      +16     
+ Partials     3826     3820       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Hywan Hywan force-pushed the feat-sdk-sqlite-event-cache-store-read-write-access branch from d3a2083 to 6c7ae3d Compare July 13, 2025 06:57
Hywan added 2 commits July 13, 2025 09:12
This changes the `TracingTimer` message to use the `Debug` impl of
`Duration` instead of displaying it as milliseconds. It can help spotting
seconds without counting all the digits.
This patch renames `TracingTiming::new_debug` to `new`. The
documentation claims it sets the log level to `debug` while the `level`
is actually an argument of the constructor. It's then wrong, and the
constructor must be renamed.
@Hywan Hywan force-pushed the feat-sdk-sqlite-event-cache-store-read-write-access branch from a62ce0d to 0ce163a Compare July 13, 2025 07:12
@Hywan Hywan force-pushed the feat-sdk-sqlite-event-cache-store-read-write-access branch from 0ce163a to f847d81 Compare July 13, 2025 07:17
Hywan added 3 commits July 13, 2025 09:18
This patch adds `timer!` logs in each method from `EventCacheStore` for
`SqliteEventCacheStore`. It will help to know the execution duration of
each of these methods.
@Hywan Hywan force-pushed the feat-sdk-sqlite-event-cache-store-read-write-access branch from f847d81 to fd73cb3 Compare July 13, 2025 07:18
@Hywan
Copy link
Member Author

Hywan commented Jul 13, 2025

Quoting myself (because why not):

Bonus: I'm thinking about it right now, but we may want to add logs around write() to know how much it took to acquire the write connection. It could be a very useful information to debug slowness if any. I'll do that later. Please don't merge this PR immediately.

Done. I've added #[instrument] + timer! logs, almost everywhere. It will help spot slowness more easily.

Copy link
Contributor

@poljar poljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left a question.

@Hywan Hywan merged commit d73a02c into matrix-org:main Jul 14, 2025
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants