[Proposal] Incremental reprocessing #349

rkistner · 2025-09-01T13:27:12Z

rkistner
Sep 1, 2025
Maintainer

Background

Currently, when changes to sync rules or upcoming sync streams are deployed, PowerSync re-replicates all data from the source database from scratch, processing it with the new sync rules. Once that is ready, clients are switched over to sync from the new copy.

While there is no direct "downtime", it can take a long time on large databases, and clients have to re-sync all data even if only a small portion changed.

Status

2025-09-01: Basic approach is documented. We need to investigate the feasibility of the two different approaches in more detail.

Proposal

The base idea is to only reprocess bucket or sync stream definitions that have actually changed. This is on the definition-level - any change to any single query in a bucket definition would cause the entire bucket definition to be re-processed, and all related buckets to be re-synced.

As a core of this, we need to implement a "diff" between two versions of sync rules / sync stream definitions, specifically giving us:

Unchanged definitions.
New definitions.
Removed definitions.

Each modified definition is treated as a new definition + a removed definition - we do not perform any granular updating inside definitions. Generally, this means that if a query changed, that entire definition will be re-reprocessed and re-synced as new buckets - existing buckets are never updated.

This depends on versioned_bucket_ids as described here. Unchanged definitions will keep the old version id, while new definitions will get the new id.

Implementation Option 1

Here, we keep the current sync rules version active, while processing new bucket/stream definitions concurrently. Essentially we have:

Old stream: replicates (unchanged definitions, removed definitions).
New stream: replicates (new definitions).

One the replication stream for the new definitions have caught up:

Wait until both replication streams are at exactly the same position.
Stop replication.
Create a new replication stream, that combines the new definitions from the new sync rules, with the unchanged definitions from the old sync rules: (unchanged definitions, new definitions).
Drop data related to removed definitions.

Here, the most tricky part is to get the two replication streams "in sync" before replacing them with a single one again.

Implementation Option 2

This approach uses a single replication stream, and re-replicates existing data within it:

New definitions are added to the existing replication stream: Replicate all new data for (unchanged definitions, new definitions, removed definitions).
Concurrently, re-snapshot each table related to (new definitions). Here we need to be careful:
1. Make sure to not replace newer replicated data with older snapshot data.
2. Make sure to not introduce significant delays into the replication stream.
3. Do not update unchanged bucket data.
Remove (removed definitions) from the replication stream.
Drop data related to (removed definitions).

What makes this implementation particularly tricky is avoiding updating existing bucket data if unchanged: If we do trigger updates for those, it can cause clients to re-sync the data twice: Once on the old definitions, and again on the new definitions.

Other considerations

Defragmenting

Currently, the fact that data is fully reprocessed is used as a method for "defragmenting", as described here. If we implement the incremental reprocessing, we need alternative methods for defragmenting.

Config changes

Changes to replication config affect all bucket & stream definitions, so still requires re-replicating all data. For the most part it is very difficult to predict the effects of config changes on a more granular level.

However, if we avoid creating new operations for unchanged bucket data, we can avoid re-syncing data unaffected by config changes to clients.

G2Jose · 2025-09-04T20:18:48Z

G2Jose
Sep 4, 2025

Just wanted to chime in and say this would be super helpful for my personal use case with powersync in 2 ways:

Currently I have a hard loading screen whenever a full resync happens in my app. I've tried going without, but it seems to require quite a bit of processing that it really tanks UI performance while the sync is in progress. I'm having to deploy sync rules anytime I add a new table.
I'm self hosting powersync and have provisioned a certain number of IOPS and throughput. I recently ran into an issue where I exceeded some of these thresholds and ended up in a state where my EC2 stopped responding for some time.

2 replies

simolus3 Sep 5, 2025
Maintainer

I've tried going without, but it seems to require quite a bit of processing that it really tanks UI performance while the sync is in progress

Out of interest, are you using the newer Rust client for this? On RN, that can greatly improve sync performance (and also improves UI responsiveness by offloading work to a background thread, but there's still a bit of work happening on the main thread). So that might be worth trying out if you haven't looked at it already.

G2Jose Sep 5, 2025

Thanks for this, I didn't know there was a rust client I can drop in!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Proposal] Incremental reprocessing #349

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Proposal] Incremental reprocessing #349

Uh oh!

Uh oh!

rkistner Sep 1, 2025 Maintainer

Background

Status

Proposal

Implementation Option 1

Implementation Option 2

Other considerations

Defragmenting

Config changes

Replies: 1 comment · 2 replies

Uh oh!

G2Jose Sep 4, 2025

Uh oh!

simolus3 Sep 5, 2025 Maintainer

Uh oh!

G2Jose Sep 5, 2025

rkistner
Sep 1, 2025
Maintainer

Replies: 1 comment 2 replies

G2Jose
Sep 4, 2025

simolus3 Sep 5, 2025
Maintainer