Flink CDC2.1.1 varchar类型主键大表数据同步速度很慢的问题 #801
Unanswered
a120610114
asked this question in
Q&A
Replies: 2 comments 19 replies
-
@wuchong 大佬可以帮忙解答一下吗 |
Beta Was this translation helpful? Give feedback.
4 replies
-
现在varchar 类型慢,一般是卡在前面计算切片的时候(要把所有切片都分好,才开始每个切片的读取)。有一个优化思路是异步切片:
|
Beta Was this translation helpful? Give feedback.
15 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
在使用FlinkCDC2.1.1 版本进行数据同步的时候我发现对于主键是varchar类型的大表数据(大约3000w左右的数据)数据同步非常慢,而且还经常报错。在查看日志的时候发现了这样的日志



fb72562c-845f-11ea-88a6-b8599fe5d1ea:1-244504311, row=0, event=0} for split MySqlSnapshotSplit{tableId=wxqyh_learnonline.tb_qy_examination_exam_user_ref, splitId='wxqyh_learnonline.tb_qy_examination_exam_user_ref:2031', splitKeyType=[
idVARCHAR(32) NOT NULL], splitStart=[b6547e5906a64ae6a5b062492f1b4b91], splitEnd=[b66b1de7513043379a327af9086c8f80], highWatermark=null} 2022-01-07 00:28:28,429 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - Snapshot step 2 - Snapshotting data 2022-01-07 00:28:28,429 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - Exporting data from split 'wxqyh_learnonline.tb_qy_examination_exam_user_ref:2031' of table wxqyh_learnonline.tb_qy_examination_exam_user_ref 2022-01-07 00:28:28,429 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - For split 'wxqyh_learnonline.tb_qy_examination_exam_user_ref:2031' of table wxqyh_learnonline.tb_qy_examination_exam_user_ref using select statement: 'SELECT * FROM
wxqyh_learnonline.
tb_qy_examination_exam_user_refWHERE id >= ? AND NOT (id = ?) AND id <= ?' 2022-01-07 00:28:31,217 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - Finished exporting 8093 records for split 'wxqyh_learnonline.tb_qy_examination_exam_user_ref:2031', total duration '00:00:02.788' 2022-01-07 00:28:31,222 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - Snapshot step 3 - Determining high watermark {ts_sec=0, file=mysql-bin.001188, pos=53630482, gtids=1935d4a6-2e71-11e9-9330-6c92bf5f0aed:450812989-484900678, 4ddb2c3f-6f05-11e9-8a9c-6c0b84d5a828:1-411556268,
这让我很疑惑,查看源码也是一样的
这是直接对varchar类型进行切分然后查询,当我用这个语句去数据库查询的时候发现非常慢
我想问下这种想象是正常的吗?设计的原理是什么?有没有什么办法可以提高这种varchar 大表数据的同步?
Beta Was this translation helpful? Give feedback.
All reactions