Skip to content

Commit 67ce44a

Browse files
committed
Update documentation
1 parent 115ebad commit 67ce44a

File tree

1 file changed

+15
-5
lines changed
  • content/integrate/redis-data-integration/data-pipelines/prepare-dbs

1 file changed

+15
-5
lines changed

content/integrate/redis-data-integration/data-pipelines/prepare-dbs/mongodb.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,22 @@ This guide describes the steps required to prepare a MongoDB database as a sourc
2323
- **User privileges:** You must have a MongoDB user with sufficient privileges to read the oplog and collections, and to use change streams.
2424
- **Network access:** The RDI Collector must be able to connect to all MongoDB nodes in your deployment.
2525

26+
{{< note >}}The MongoDB connector is not capable of monitoring the changes of a standalone MongoDB server, since standalone servers do not have an oplog. The connector will work if the standalone server is converted to a replica set with one member.{{< /note >}}
27+
2628
## 1. Configure Oplog Size
27-
Ensure the oplog is large enough to retain changes for the duration of your RDI pipeline's snapshot and streaming operations.
28-
Follow the MongoDB [documentation](https://www.mongodb.com/docs/manual/tutorial/change-oplog-size/) to check and resize (if necessary) the oplog size.
29+
The Debezium MongoDB connector relies on the oplog to capture changes from a replica set. The oplog is a fixed-size, capped collection when it reaches its maximum size, it overwrites the oldest entries. If the connector is stopped and restarted, it attempts to resume from its last recorded position in the oplog. If that position has been overwritten, the connector may fail to start and report an invalid resume token error.
30+
31+
To prevent this, ensure the oplog retains enough history for Debezium to resume streaming after interruptions. You can do this by:
32+
33+
- **Increasing the oplog size:** Set the oplog size based on your workload, ensuring it can store more than the peak number of oplog entries generated per hour.
34+
- **Setting a minimum oplog retention period (MongoDB 4.4+):** Configure MongoDB to retain oplog entries for a minimum number of hours, guaranteeing availability even if the oplog reaches its maximum size. This is generally preferred, but for high-throughput clusters nearing capacity, you may need to increase the oplog size instead.
35+
36+
For detailed guidance, see the Debezium [oplog configuration documentation](https://debezium.io/documentation/reference/stable/connectors/mongodb.html#mongodb-optimal-oplog-config).
2937

3038
## 2. Create a MongoDB User for RDI
3139
Create a user with the following roles on the source database:
32-
- readAnyDatabase
40+
- read
41+
- readAnyDatabase (optional if you don't want to give `read` role for each database)
3342
- clusterMonitor
3443

3544
Example:
@@ -39,7 +48,8 @@ db.createUser({
3948
user: "rdi_user",
4049
pwd: "rdi_password",
4150
roles: [
42-
{ role: "readAnyDatabase", db: "admin" },
51+
{ role: "read", db: "your_database" }, // You can have multiple read roles. One per database.
52+
// { role: "readAnyDatabase", db: "admin" }, // Use this role if you don't want to give `read` role for each database.
4353
{ role: "clusterMonitor", db: "admin" }
4454
]
4555
});
@@ -56,7 +66,7 @@ Example (Sharded Cluster):
5666
```
5767
mongodb://${SOURCE_DB_USERNAME}:${SOURCE_DB_PASSWORD}@host:30000
5868
```
59-
- For Atlas, adjust the connection string accordingly.
69+
- For Atlas, adjust the connection string accordingly (see example bellow).
6070
- Set replicaSet and authSource as appropriate for your deployment.
6171

6272
## 4. Enable Change Streams and Pre/Post Images (Only if Using a Custom Key)

0 commit comments

Comments
 (0)