The Airbyte MySQL CDC connector makes assumptions about binlog file ordering that do not hold true in MySQL, especially in clustered or replicated environments. #61016
verbotenj
started this conversation in
Connector Ideas and Features
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In MySQL Connector
Binlog file names are rotated like: mysql-bin.000001, mysql-bin.000002, etc.
The file names are not globally unique and not strictly comparable across nodes.
In replication or clustered setups, the same binlog file names can appear on different nodes with different contents.
MySQL does not embed a global timeline in binlogs unless you use GTID.
Airbyte (via Debezium) tracks CDC state like this:
{
"file": "mysql-bin.000166",
"position": 138596491
}
The connector compares binlog filenames and positions directly — assuming that:
000166 > 000089 always means “further ahead in time”
Binlogs are sequential and increasing
This logic breaks in distributed MySQL deployments, or after failover, rotation, or purging.
Consequences
Data loss: If Airbyte thinks it has passed a binlog it hasn't, it skips records.
Duplicates: If it restarts incorrectly, you may get replayed rows.
Broken syncs: State files point to binlogs that no longer exist or are misordered.
Beta Was this translation helpful? Give feedback.
All reactions