You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+29-6Lines changed: 29 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
Synchronize Production NOSQL and SQL data to Standalone instances for Data scientists or other purposes. A **Go-based** tool to synchronize MongoDB or SQL data from a **MongoDB replica set** or **sharded cluster** or production SQL instance to a **standalone instance**, supports initial and incremental synchronization with change stream monitoring.
4
4
5
5
> [!NOTE]
6
-
> Sync is now supporting MongoDB, MySQl, PostgreSQLand MariaDB. Next `Sync` will support Redis and Elasticsearch.
6
+
> Sync is now supporting MongoDB, MySQL, PostgreSQL, MariaDB, and Redis. Next,`Sync` will support Elasticsearch.
7
7
8
8
## What is the problem
9
9
Let’s assume you have a mid to big-size SaaS platform or service with multiple tech teams and stakeholders. Different teams have different requirements for analyzing the production data independently. However, the tech team doesn’t want to allow all these stakeholders direct access to the production databases due to security and stability issues.
@@ -16,7 +16,8 @@ Create standalone databases outside of your production database servers with the
16
16
- MongoDB (Sharded clusters, Replica sets)
17
17
- MySQL
18
18
- MariaDB
19
-
- PostgreSQL (PostgreSQL version 10+、Enable logical replication)
19
+
- PostgreSQL (PostgreSQL version 10+ with logical replication enabled)
20
+
- Redis (Standalone, Sentinel; does not support cluster mode)
20
21
21
22
## High Level Design Diagram
22
23
@@ -31,7 +32,7 @@ Create standalone databases outside of your production database servers with the
@@ -41,14 +42,17 @@ Create standalone databases outside of your production database servers with the
41
42
- MongoDB: Bulk synchronization of data from the MongoDB cluster or MongoDB replica set to the standalone MongoDB instance.
42
43
- MySQL/MariaDB: Initial synchronization using batch inserts (default batch size: 100 rows) from the source to the target if the target table is empty.
43
44
- PostgreSQL: Initial synchronization using batch inserts (default batch size: 100 rows) from the source to the target using logical replication slots and the pgoutput plugin.
44
-
-**Change Stream & Binlog Monitoring**:
45
+
- Redis: Supports full data synchronization for standalone Redis and Sentinel setups using Redis Streams and Keyspace Notifications.
46
+
-**Change Stream & Incremental Updates**:
45
47
- MongoDB: Watches for real-time changes (insert, update, replace, delete) in the cluster's collections and reflects them in the standalone instance.
46
48
- MySQL/MariaDB: Uses binlog replication events to capture and apply incremental changes to the target.
47
49
- PostgreSQL: Uses WAL (Write-Ahead Log) with the pgoutput plugin to capture and apply incremental changes to the target.
50
+
- Redis: Uses Redis Streams and Keyspace Notifications to capture and sync incremental changes in real-time.
48
51
-**Batch Processing & Concurrency**:
49
52
Handles synchronization in batches for optimized performance and supports parallel synchronization for multiple collections/tables.
50
53
-**Restart Resilience**:
51
-
Stores MongoDB resume tokens, MySQL binlog positions, and PostgreSQL replication positions in configurable state files, allowing the tool to resume synchronization from the last known position after a restart.
54
+
Stores MongoDB resume tokens, MySQL binlog positions, PostgreSQL replication positions, and Redis stream offsets in configurable state files, allowing the tool to resume synchronization from the last known position after a restart.
55
+
-**Note for Redis**: Redis does not support resuming from the last state after a sync interruption. If `Sync` is interrupted or crashes, it will restart the synchronization process by executing the initial sync method to retrieve all keys and sync them to the target database. This is due to limitations in Redis Streams and Keyspace Notifications, which do not provide a built-in mechanism to persist and resume stream offsets across restarts. As a result, the tool cannot accurately determine the last synced state and must perform a full resync to ensure data consistency.
52
56
-**Grafana Integration**:
53
57
- For data visualization, this tool integrates with **Grafana** using data from **GCP Logging** and **GCP BigQuery**.
54
58
- When **`enable_table_row_count_monitoring`** is enabled, the tool records data changes, including table row counts, in **GCP Logging**.
@@ -65,6 +69,10 @@ Create standalone databases outside of your production database servers with the
65
69
- For PostgreSQL sources:
66
70
- A PostgreSQL instance with logical replication enabled and a replication slot created.
67
71
- A target PostgreSQL instance with write permissions.
72
+
- For Redis sources:
73
+
- Redis standalone or Sentinel setup with Redis version >= 5.0.
74
+
- Redis Streams and Keyspace Notifications enabled.
75
+
- A target Redis instance with write permissions.
68
76
69
77
## Quick start
70
78
@@ -169,21 +177,36 @@ sync_configs:
169
177
tables:
170
178
- source_table: "users"
171
179
target_table: "users"
180
+
181
+
- type: "redis"
182
+
enable: true
183
+
source_connection: "redis://localhost:6379/0"
184
+
target_connection: "redis://localhost:6379/1"
185
+
redis_position_path: "/tmp/state/redis_position"
186
+
mappings:
187
+
- source_database: "db0"
188
+
target_database: "db1"
189
+
tables:
190
+
- source_table: "source_stream"# Redis Stream Name
191
+
target_table: ""
172
192
```
173
193
174
194
## Real-Time Synchronization
175
195
176
196
- MongoDB: Uses Change Streams from replica sets or sharded clusters for incremental updates.
177
197
- MySQL/MariaDB: Uses binlog replication to apply incremental changes to the target.
178
198
- PostgreSQL: Uses WAL (Write-Ahead Log) with the pgoutput plugin to apply incremental changes to the target.
199
+
- Redis: Uses Redis Streams and Keyspace Notifications to sync changes in real-time.
200
+
- **Note for Redis**: If `Sync` is interrupted, Redis will restart the synchronization process with an initial sync of all keys to the target. This ensures data consistency but may increase synchronization time after interruptions.
179
201
180
202
On the restart, the tool resumes from the stored state (resume token for MongoDB, binlog position for MySQL/MariaDB, replication slot for PostgreSQL).
181
203
182
204
## Availability
183
205
184
206
- MongoDB: MongoDB Change Streams require a replica set or sharded cluster. See [Convert Standalone to Replica Set](https://www.mongodb.com/docs/manual/tutorial/convert-standalone-to-replica-set/).
185
207
- MySQL/MariaDB: MySQL/MariaDB binlog-based incremental sync requires ROW or MIXED binlog format for proper event capturing.
186
-
- PostgreSQL incremental sync requires logical replication enabled with a replication slot.
208
+
- PostgreSQL: PostgreSQL incremental sync requires logical replication enabled with a replication slot.
209
+
- Redis: Redis sync supports standalone and Sentinel setups but does not support Redis Cluster mode. Redis does not support resuming from the last synced state after a crash or interruption.
0 commit comments