Using Flink CDC to Capture MySQL data change and publish processed result to Kinesis Data Stream

Backgroud

In various scenarios, customer has the need to capture the change from relational databases, perform data transformation, and stream the resulting data to downstream for further use. Some common scenarios includes

Database Migration --- Migration to a new database that incurs schema change can adopt to this solution. Both databases can run concurrently to avoid downtown during the switch.
Data Lake Ingression --- Real time data processing pipeline to aggregate data and load into the data lake.
Database Synchronization --- Use this solution to keep data in different databases in sync.

Architecture

Flink CDC can capture the change from databases including MySQL, PostgreSQL, SQL Server
Flink running on Kinesis Data Analytics will perform join, map, filter... operations to transform data into specific form.
The resulting data can be sinked into destinations such as Kinesis Data Stream for further processing.

![Architecture Diagram](img/Architecture Diagram.png)

Prerequisite

Turn on binlog in MySQL where CDC will be captured
- MysSQL: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.MySQL.BinaryFormat.html
- Aurora: https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/

Setup

Fill in the MySQL and Kinesis configuration in the code with the actual information.
Run mvm package to build the maven project.
Upload the flink-cdc-mysql-1.0-SNAPSHOT.jar file under target/ folder to S3.
Start a Kinesis Data Analytics application with the S3 path containing the jar file.
After inserting an order record in Order table with valid customerId, the Flink application will join the order record with the customer record. The resulting record will appear in the Kinesis Data Stream.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
img		img
src/main/java		src/main/java
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Using Flink CDC to Capture MySQL data change and publish processed result to Kinesis Data Stream

Backgroud

Architecture

Prerequisite

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Simone319/flink-cdc-mysql

Folders and files

Latest commit

History

Repository files navigation

Using Flink CDC to Capture MySQL data change and publish processed result to Kinesis Data Stream

Backgroud

Architecture

Prerequisite

Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages