Skip to content

Commit 3d4d5e9

Browse files
Add syncer error recovery troubleshooting documentation
- Add troubleshooting section for unrecoverable syncer errors - Document exit code 4 behavior and DMC response - Provide recovery procedures for regular and Active-Active databases - Include REST API and crdb-cli recovery methods - Add clear examples with placeholder values Resolves DOC-1554
1 parent 73fda7e commit 3d4d5e9

File tree

1 file changed

+48
-4
lines changed
  • content/operate/rs/databases/active-active

1 file changed

+48
-4
lines changed

content/operate/rs/databases/active-active/syncer.md

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,19 +28,19 @@ When a new primary is appointed, the replication ID changes, but a partial sync
2828

2929

3030
In a partial sync, the backlog of operations since the offset are transferred as raw operations.
31-
In a full sync, the data from the primary is transferred to the replica as an RDB file which is followed by a partial sync.
31+
In a full sync, the data from the primary is transferred to the replica as an RDB file which is followed by a partial sync.
3232

3333
Partial synchronization requires a backlog large enough to store the data operations until connection is restored. See [replication backlog]({{< relref "/operate/rs/databases/active-active/manage#replication-backlog" >}}) for more info on changing the replication backlog size.
3434

3535
### Syncer in Active-Active replication
3636

3737
In the case of an Active-Active database:
3838

39-
- Multiple past replication IDs and offsets are stored to allow for multiple syncs
40-
- The [Active-Active replication backlog]({{< relref "/operate/rs/databases/active-active/manage#replication-backlog" >}}) is also sent to the replica during a full sync.
39+
- Multiple past replication IDs and offsets are stored to allow for multiple syncs
40+
- The [Active-Active replication backlog]({{< relref "/operate/rs/databases/active-active/manage#replication-backlog" >}}) is also sent to the replica during a full sync.
4141

4242
{{< warning >}}
43-
Full sync triggers heavy data transfers between geo-replicated instances of an Active-Active database.
43+
Full sync triggers heavy data transfers between geo-replicated instances of an Active-Active database.
4444
{{< /warning >}}
4545

4646
An Active-Active database uses partial synchronization in the following situations:
@@ -53,4 +53,48 @@ An Active-Active database uses partial synchronization in the following situatio
5353

5454
{{< note >}}
5555
Synchronization of data from the primary shard to the replica shard is always a full synchronization.
56+
{{< /note >}}
57+
58+
## Troubleshooting syncer errors
59+
60+
### Unrecoverable syncer errors
61+
62+
Some syncer errors are unrecoverable and cause the syncer to exit with exit code 4. When this occurs, the Database Management Component (DMC) automatically sets the `crdt_sync` or `replica_sync` value to `stopped`.
63+
64+
### Recovery procedures
65+
66+
To re-enable the syncer after an unrecoverable error:
67+
68+
#### For regular databases
69+
70+
Use the cluster REST API to enable sync:
71+
72+
```sh
73+
curl -v -k -u <username>:<password> -X PUT \
74+
-H "Content-Type: application/json" \
75+
-d '{"sync":"enabled"}' \
76+
http://<cluster-endpoint>:8080/v1/bdbs/<bdb_id>
77+
```
78+
79+
#### For Active-Active databases (CRDB)
80+
81+
For Active-Active databases, you have two options:
82+
83+
1. **Call the API on all participating clusters:**
84+
85+
```sh
86+
curl -v -k -u <username>:<password> -X PUT \
87+
-H "Content-Type: application/json" \
88+
-d '{"sync":"enabled"}' \
89+
http://<cluster-endpoint>:8080/v1/bdbs/<bdb_id>
90+
```
91+
92+
2. **Use crdb-cli (recommended):**
93+
94+
```sh
95+
crdb-cli crdb update --crdb-guid <crdb-guid> --force
96+
```
97+
98+
{{< note >}}
99+
Replace `<username>`, `<password>`, `<cluster-endpoint>`, `<bdb_id>`, and `<crdb-guid>` with your actual values.
56100
{{< /note >}}

0 commit comments

Comments
 (0)