Replies: 1 comment 2 replies
-
Just wanted to chime in and say this would be super helpful for my personal use case with powersync in 2 ways:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Currently, when changes to sync rules or upcoming sync streams are deployed, PowerSync re-replicates all data from the source database from scratch, processing it with the new sync rules. Once that is ready, clients are switched over to sync from the new copy.
While there is no direct "downtime", it can take a long time on large databases, and clients have to re-sync all data even if only a small portion changed.
Status
2025-09-01: Basic approach is documented. We need to investigate the feasibility of the two different approaches in more detail.
Proposal
The base idea is to only reprocess bucket or sync stream definitions that have actually changed. This is on the definition-level - any change to any single query in a bucket definition would cause the entire bucket definition to be re-processed, and all related buckets to be re-synced.
As a core of this, we need to implement a "diff" between two versions of sync rules / sync stream definitions, specifically giving us:
Each modified definition is treated as a new definition + a removed definition - we do not perform any granular updating inside definitions. Generally, this means that if a query changed, that entire definition will be re-reprocessed and re-synced as new buckets - existing buckets are never updated.
This depends on
versioned_bucket_ids
as described here. Unchanged definitions will keep the old version id, while new definitions will get the new id.Implementation Option 1
Here, we keep the current sync rules version active, while processing new bucket/stream definitions concurrently. Essentially we have:
One the replication stream for the new definitions have caught up:
Here, the most tricky part is to get the two replication streams "in sync" before replacing them with a single one again.
Implementation Option 2
This approach uses a single replication stream, and re-replicates existing data within it:
What makes this implementation particularly tricky is avoiding updating existing bucket data if unchanged: If we do trigger updates for those, it can cause clients to re-sync the data twice: Once on the old definitions, and again on the new definitions.
Other considerations
Defragmenting
Currently, the fact that data is fully reprocessed is used as a method for "defragmenting", as described here. If we implement the incremental reprocessing, we need alternative methods for defragmenting.
Config changes
Changes to replication config affect all bucket & stream definitions, so still requires re-replicating all data. For the most part it is very difficult to predict the effects of config changes on a more granular level.
However, if we avoid creating new operations for unchanged bucket data, we can avoid re-syncing data unaffected by config changes to clients.
Beta Was this translation helpful? Give feedback.
All reactions