Skip to content

SQL server: consistent snapshot and lsn selection #32979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

martykulma
Copy link
Contributor

@martykulma martykulma commented Jul 10, 2025

The current mechanism for snapshot and LSN selection is not correct due to the asynchronous nature of MS SQL CDC and the implementation details of SNAPSHOT isolation.

This implementation is similar to the MySQL implementation in that it locks the tables for a brief period to establish a point in time where a consistent view of the DB can be observed. Once the LSN and transaction sequence number (XSN), the table locks are dropped, and we have a consistent version of the table(s) to read from as well as a correct LSN to process CDC events.

Motivation

Fixes https://github.com/MaterializeInc/database-issues/issues/9476

Tips for reviewer

You can look at the commits to see the history, but the actual fix is not until the end.

I didn't squash my commits because the bug that this fixes was created in response to this PR (at the time a draft that was just a test change to show the race condition). It will get squashed on merge!

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ptravers ptravers self-requested a review July 10, 2025 21:59
@martykulma martykulma force-pushed the sql-server-resume-lsn branch 3 times, most recently from 8ca2ca9 to ba8b692 Compare July 18, 2025 23:46
@martykulma martykulma changed the title delete column, drop source SQL server: consistent snapshot and lsn selection Jul 22, 2025
@martykulma martykulma force-pushed the sql-server-resume-lsn branch 3 times, most recently from a4eb8f0 to e502e99 Compare July 23, 2025 02:32
@martykulma martykulma force-pushed the sql-server-resume-lsn branch from e502e99 to 831bb01 Compare July 23, 2025 11:47
@martykulma martykulma marked this pull request as ready for review July 23, 2025 15:04
@martykulma martykulma requested a review from a team as a code owner July 23, 2025 15:04
@@ -140,13 +181,57 @@ impl<'a> CdcStream<'a> {
// the upstream DB is ready for CDC.
self.wait_for_ready().await?;

tracing::info!("Upstream is ready");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might want to add some additional context here and also are we sure we want this at info level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see! would we ever see these concurrently on the same node? would it be clear which database is ready?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 - just realized i didn't include ?tables

tracing::trace!(%schema, %table, "locking table");
fence_txn.lock_table_exclusive(&*schema, &*table).await?;
}
tracing::info!(?tables, "Locked tables");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same comment as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto!

@martykulma martykulma requested a review from ptravers July 23, 2025 16:10
let mut fencing_client = self.client.new_connection().await?;
let mut fence_txn = fencing_client.transaction().await?;

// TODO (maz): we should consider a timeout or a lock + snapshot per-table instead of collectively
Copy link
Contributor

@ptravers ptravers Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what telemetry do we need to be able to make decision about the necessity of locking per table?

I would guess we need to know whether the table would have been locked for less time that way?

Copy link
Contributor

@ptravers ptravers Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up: maybe we can batch with a configurable number of tables per batch? we might want to track with a histogram/metric roughly how long we are locking a tables.

seems like this only becomes a problem if there's a very large number of tables or a very slow lock acquisition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants