Skip to content

just-modeling/cdc-debezium-engine-release

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

cdc-debezium-engine-release

What is this?

A lightweight Java application that pushes CDC events as real-time data streaming from various of databases to the target platform.

The implementation architecture is based on Debezium Engine

Current support target platforms are:

  • Databricks
  • Apache Spark

Note The pipelines that handle merging CDC events to target tables are developed separately.

Installation

Databricks

Authenticate to workspace where the library need to be deployed.

PROFILE=dev
databricks auth login --host <workspace_url> -p ${PROFILE}

Create deployment volume in the root catalog for the environment. This must be done by user who has all privilleges on the catalog.

databricks volumes create <CATALOG_NAME> default libraries MANAGED

Download the jar and run the following command to deploy the jar to Databricks workspace

TAG=2.1.4-1.2.2
CATALOG=dev
PROFILE=dev
databricks fs cp justmodeling-pitcrew-cdc-${TAG}-jar-with-dependencies.jar dbfs:/Volumes/${CATALOG}/default/libraries/justmodeling-pitcrew-cdc-0.1.0-jar-with-dependencies.jar --overwrite -p ${PROFILE}

Usage

Create example config.ini file as below.

[DEBEZIUM-CONFIG]
schema.history.internal=com.justmodeling.pitcrewcdc.databricks.DbfsSchemaHistory
offset.storage=com.justmodeling.pitcrewcdc.databricks.DbfsOffsetBackingStore
connector.class=io.debezium.connector.sqlserver.SqlServerConnector
schema.history.internal.file.filename=/dbfs/cdc-checkpoint/XXX/XXX/schema.dat
offset.storage.file.filename=/dbfs/cdc-checkpoint/XXX/XXX/offsets.dat
database.hostname=SECRET|SCOPE|HOSTNAME
database.port=1433
database.user=SECRET|SCOPE|USERNAME
database.password=SECRET|SCOPE|PASSWORD
topic.prefix=<TOPIC-NAME>
snapshot.mode=initial
snapshot.isolation.mode=snapshot
tombstones.on.delete=false
name=mssql

[DATABRICKS-CONFIG]
cdc.persist.catalog=ENV|CATALOG|CATALOG-NAME
cdc.persist.database=<DB-NAME>
cdc.persist.table=<TABLE-NAME>

Simply run the JAR file with configuration file, called config.ini

java com.justmodeling.pitcrewcdc.databricks.MSSQLChangeCapture config.ini

Optionally, instead of hard code the catalog or credentials, you can leverage Databricks secret vault to pass values to the config.ini. If you have "my_username" stored in secret vault under scope "mysecret", configure the config.ini like below:

database.user=SECRET|mysecret|my_username

this instructs the engine to lookup secret called "my_username" under scope "mysecret". Similary you can inject environment variables in a similar way

cdc.persist.catalog=ENV|catalog_name|us_dev

this instructe the gine to lookup environment vars called "catalog_name", if it cannot find one, default it to the value "us_dev"

Connectors

MSSQL

Debezium SQLServer Connector

SQL SERVER Always On

The SQL Server connector can capture changes from an Always On read-only replica.

Prerequisites

  • Change data capture is configured and enabled on the primary node. SQL Server does not support CDC directly on replicas.

  • The configuration option database.applicationIntent is set to ReadOnly. This is required by SQL Server. When Debezium detects this configuration option, it responds by taking the following actions:

    • Enable snapshot isolation
    ALTER DATABASE <database_name>  
    SET ALLOW_SNAPSHOT_ISOLATION ON 
    • Sets snapshot.isolation.mode to snapshot, which is the only one transaction isolation mode supported for read-only replicas.

    • Commits the (read-only) transaction in every execution of the streaming query loop, which is necessary to get the latest view of CDC data.

MongoDB

Debezium MongoDB Connector

MySQL

Debezium MySQL Connector

PostgreSQL

Debezium PostgreSQL Connector

Oracle

Debezium Oracle Connector

About

This is a repo to publish releases for project cdc-debezium-engine. Original repository is https://github.com/just-modeling/cdc-debezium-engine

Resources

License

Stars

Watchers

Forks

Packages

No packages published