Skip to content

Commit b429de0

Browse files
anton-bobkovCyberROFLblinkov
authored
Added the English version of the docs for asynchronous replication (#9609)
Co-authored-by: Ilnaz Nizametdinov <i.nizametdinov@gmail.com> Co-authored-by: Ivan Blinkov <ivan@blinkov.ru>
1 parent e8e060a commit b429de0

File tree

11 files changed

+376
-5
lines changed

11 files changed

+376
-5
lines changed
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Asynchronous replication
2+
3+
Asynchronous replication allows for synchronizing data between {{ ydb-short-name }} [databases](glossary.md#database) in near real time. It can also be used for data migration between databases with minimal downtime for applications interacting with these databases. Such databases can be located in the same {{ ydb-short-name }} [cluster](glossary.md#cluster) as well as in different clusters.
4+
5+
## Overview {#how-it-works}
6+
7+
Asynchronous replication is based on [Change Data Capture](cdc.md) and operates on logical data. The following diagram illustrates the replication process:
8+
9+
```mermaid
10+
sequenceDiagram
11+
participant dst as Source
12+
participant src as Target
13+
14+
dst-->src: Initialization
15+
dst->>dst: Creating an asynchronous replication instance
16+
dst->>src: Creating changefeeds
17+
dst->>dst: Creating replica objects
18+
19+
dst-->src: Initial table scan
20+
loop
21+
dst->>src: Request to get data
22+
src->>dst: Data
23+
end
24+
25+
dst-->src: Change data replication
26+
loop
27+
dst->>src: Request to get data
28+
src->>dst: Data
29+
end
30+
```
31+
32+
As shown in the diagram above, asynchronous replication involves two databases:
33+
34+
1. **Source**. A database with [replicated objects](glossary.md#replicated-object).
35+
2. **Target**. A database where an [asynchronous replication instance](glossary.md#async-replication-instance) and [replica objects](glossary.md#replica-object) will be created.
36+
37+
Asynchronous replication consists of the following stages:
38+
39+
* [Initialization](#init)
40+
* [Initial table scan](#initial-scan)
41+
* [Change data replication](#replication-of-changes)
42+
43+
### Initialization {#init}
44+
45+
Initialization of asynchronous replication includes the following steps:
46+
47+
* Creating an asynchronous replication instance on the target database using the [CREATE ASYNC REPLICATION](../yql/reference/syntax/create-async-replication.md) YQL expression.
48+
* Establishing a connection with the source database. The target database connects to the source using the [connection parameters](../yql/reference/syntax/create-async-replication.md#params) specified during the creation of the asynchronous replication instance.
49+
50+
{% note info %}
51+
52+
The user account that is used to connect to the source database must have the following [permissions](../security/short-access-control-notation.md#access-rights):
53+
54+
* Read permissions for schema objects and directory objects
55+
* Create, update, delete, and read permissions for changefeeds
56+
57+
{% endnote %}
58+
59+
* The following objects are created for replicated objects on the source:
60+
* [changefeeds](glossary.md#changefeed) on the source
61+
* [replica objects](glossary.md#replica-object) on the target
62+
63+
{% note info %}
64+
65+
Replicas are created under the user account that was used to create the asynchronous replication instance.
66+
67+
{% endnote %}
68+
69+
### Initial table scan {#initial-scan}
70+
71+
During the [initial table scan](cdc.md#initial-scan) the source data is exported to changefeeds. The target runs [consumers](topic.md#consumer) that read the source data from the changefeeds and write it to replicas.
72+
73+
You can get the progress of the initial table scan from the [description](../reference/ydb-cli/commands/scheme-describe.md) of the asynchronous replication instance.
74+
75+
### Change data replication {#replication-of-changes}
76+
77+
After the initial table scan is completed, the consumers read the change data and write it to replicas.
78+
79+
Each change data block has its *creation time* ($created\_at$). Consumers track the *reception time* of the change data ($received\_at$). Thus, you can use the following formula to calculate the *replication lag*:
80+
81+
$$
82+
replication\_lag = received\_at - created\_at
83+
$$
84+
85+
You can also get the replication lag from the [description](../reference/ydb-cli/commands/scheme-describe.md) of the asynchronous replication instance.
86+
87+
## Restrictions {#restrictions}
88+
89+
* The set of replicated objects is immutable and is generated when {{ ydb-short-name }} creates an asynchronous replication instance.
90+
* {{ ydb-short-name }} supports the following types of replicated objects:
91+
92+
* [row-based tables](datamodel/table.md#row-oriented-tables)
93+
* [directories](datamodel/dir.md)
94+
95+
{{ ydb-short-name }} will replicate all row-based tables that are located in the given directories and subdirectories at the time the asynchronous replication instance is created.
96+
97+
* During asynchronous replication, you cannot [add or delete columns](../yql/reference/syntax/alter_table/columns.md) in the source tables.
98+
* During asynchronous replication, replicas are available only for reading.
99+
100+
## Error handling during asynchronous replication {#error-handling}
101+
102+
Possible errors during asynchronous replication can be grouped into the following classes:
103+
104+
* **Temporary failures**, such as transport errors, system overload, etc. Requests will be resent until they are processed successfully.
105+
* **Critical errors**, such as access violation errors, schema errors, etc. Replication will be aborted, and the [description](../reference/ydb-cli/commands/scheme-describe.md) of the asynchronous replication instance will include the text of the error.
106+
107+
{% note warning %}
108+
109+
Currently, asynchronous replication that is aborted due to a critical error cannot be resumed. In this case, you must [drop](../yql/reference/syntax/drop-async-replication.md) and [create](../yql/reference/syntax/create-async-replication.md) a new asynchronous replication instance.
110+
111+
{% endnote %}
112+
113+
For more information about error classes and how to address them, refer to [Error handling](../reference/ydb-sdk/error_handling.md).
114+
115+
## Asynchronous replication completion {#done}
116+
117+
Completion of asynchronous replication might be an end goal of data migration from one database to another. In this case the client stops writing data to the source, waits for the zero replication lag, and completes replication. After the replication process is completed, replicas become available both for reading and writing. Then you can switch the load from the source database to the target database and complete data migration.
118+
119+
{% note info %}
120+
121+
You cannot resume completed asynchronous replication.
122+
123+
{% endnote %}
124+
125+
{% note warning %}
126+
127+
{{ ydb-short-name }} currently supports only **forced** completion of asynchronous replication, when no additional checks are performed for data consistency, replication lag, etc.
128+
129+
{% endnote %}
130+
131+
To complete asynchronous replication, use the [ALTER ASYNC REPLICATION](../yql/reference/syntax/alter-async-replication.md) YQL expression.
132+
133+
## Dropping an asynchronous replication instance {#drop}
134+
135+
When you drop an asynchronous replication instance:
136+
137+
* Changefeeds are deleted in the source tables.
138+
* The source tables are unlocked, and you can add and delete columns again.
139+
* Optionally, all replicas are deleted.
140+
* Asynchronous replication instance is deleted.
141+
142+
To drop an asynchronous replication instance, use the [DROP ASYNC REPLICATION](../yql/reference/syntax/drop-async-replication.md) YQL expression.

ydb/docs/en/core/concepts/glossary.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,23 @@ A **consumer** is an entity that reads messages from a topic.
181181

182182
### Change data capture {#cdc}
183183

184-
**Change data capture** or **CDC** is a mechanism that allows subscribing to a stream of changes to a given [table](#table). Technically, it is implemented on top of [topics](#topic). It is described in more detail in a separate article [{#T}](cdc.md).
184+
**Change data capture** or **CDC** is a mechanism that allows subscribing to a **stream of changes** to a given [table](#table). Technically, it is implemented on top of [topics](#topic). It is described in more detail in a separate article [{#T}](cdc.md).
185+
186+
#### Changefeed {#changefeed}
187+
188+
**Changefeed** or **stream of changes** is an ordered list of changes in a given [table](#table) published via a [topic](#topic).
189+
190+
### Asynchronous replication instance {#async-replication-instance}
191+
192+
**Asynchronous replication instance** is a named entity that stores [asynchronous replication](async-replication.md) settings (connection properties, a list of replicated objects, etc.) It can also be used to retrieve the status of asynchronous replication, such as the [initial synchronization process](async-replication.md#initial-scan), [replication lag](async-replication.md#replication-of-changes), [errors](async-replication.md#error-handling), and more.
193+
194+
#### Replicated object {#replicated-object}
195+
196+
**Replicated object** is an object, for example, a table, that is asynchronously replicated to the target database.
197+
198+
#### Replica object {#replica-object}
199+
200+
**Replica object** is a mirror copy of the replicated object, automatically created by an [asynchronous replication instance](#async-replication-instance). Replica objects are typically read-only.
185201

186202
### YQL {#yql}
187203

ydb/docs/en/core/concepts/toc_i.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ items:
2222
href: cluster/common_scheme_ydb.md
2323
- name: Disk subsystem of a cluster
2424
href: cluster/distributed_storage.md
25+
- { name: Asynchronous replication, href: async-replication.md, when: feature_async_replication }
2526
- name: Federated query
2627
include: { path: federated_query/toc_p.yaml, mode: link }
2728
- name: Query optimizer
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# ALTER ASYNC REPLICATION
2+
3+
The `ALTER ASYNC REPLICATION` statement modifies the status and parameters of an [asynchronous replication instance](../../../concepts/async-replication.md).
4+
5+
## Syntax {#syntax}
6+
7+
```yql
8+
ALTER ASYNC REPLICATION <name> SET (option = value [, ...])
9+
```
10+
11+
### Parameters {#params}
12+
13+
* `name` — a name of the asynchronous replication instance.
14+
* `SET (option = value [, ...])` — asynchronous replication parameters:
15+
16+
* `STATE` — the state of asynchronous replication. This parameter can only be used in combination with the `FAILOVER_MODE` parameter (see below). Valid values are:
17+
18+
* `DONE` — [completion of the asynchronous replication process](../../../concepts/async-replication.md#done).
19+
20+
* `FAILOVER_MODE` — the mode for changing the replication state. This parameter can only be used in combination with the `STATE` parameter. Valid values are:
21+
22+
* `FORCE` — forced failover.
23+
24+
## Examples {#examples}
25+
26+
The following statement forces the asynchronous replication process to complete:
27+
28+
```yql
29+
ALTER ASYNC REPLICATION my_replication SET (STATE = "DONE", FAILOVER_MODE = "FORCE");
30+
```
31+
32+
## See also
33+
34+
* [CREATE ASYNC REPLICATION](create-async-replication.md)
35+
* [DROP ASYNC REPLICATION](drop-async-replication.md)
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# CREATE ASYNC REPLICATION
2+
3+
The `CREATE ASYNC REPLICATION` statement creates an [asynchronous replication instance](../../../concepts/async-replication.md).
4+
5+
## Syntax {#syntax}
6+
7+
```yql
8+
CREATE ASYNC REPLICATION <name>
9+
FOR <remote_path> AS <local_path> [, <another_remote_path> AS <another_local_path>]
10+
WITH (option = value [, ...])
11+
```
12+
13+
### Parameters {#params}
14+
15+
* `name` — a name of the asynchronous replication instance.
16+
* `remote_path` — a relative or absolute path to a table or directory in the source database.
17+
* `local_path` — a relative or absolute path to a target table or directory in the local database.
18+
* `WITH (option = value [, ...])` — asynchronous replication parameters:
19+
20+
21+
* `CONNECTION_STRING` — a [connection string](../../../concepts/connect.md#connection_string) for the source database (mandatory).
22+
* Authentication details for the source database (mandatory) depending on the authentication method:
23+
24+
* [Access token](../../../recipes/ydb-sdk/auth-access-token.md):
25+
26+
* `TOKEN_SECRET_NAME` — the name of the [secret](../../../concepts/datamodel/secrets.md) that contains the token.
27+
28+
* [Login and password](../../../recipes/ydb-sdk/auth-static.md):
29+
30+
* `USER` — a database user name.
31+
* `PASSWORD_SECRET_NAME` — the name of the [secret](../../../concepts/datamodel/secrets.md) that contains the password for the source database user.
32+
33+
## Examples {#examples}
34+
35+
{% note tip %}
36+
37+
Before creating an asynchronous replication instance, you must [create](create-object-type-secret.md) a secret with authentication credentials for the source database or ensure that you have access to an existing secret.
38+
39+
{% endnote %}
40+
41+
The following statement creates an asynchronous replication instance to synchronize the `original_table` source table in the `/Root/another_database` database to the `replica_table` target table in the local database:
42+
43+
```yql
44+
CREATE ASYNC REPLICATION my_replication_for_single_table
45+
FOR original_table AS replica_table
46+
WITH (
47+
CONNECTION_STRING = 'grpcs://example.com:2135/?database=/Root/another_database',
48+
TOKEN_SECRET_NAME = 'my_secret'
49+
);
50+
```
51+
52+
The statement above uses the token from the `my_secret` secret for authentication and the `grpcs://example.com:2135` [endpoint](../../../concepts/connect.md#endpoint) to connect to the `/Root/another_database` database.
53+
54+
The following statement creates an asynchronous replication instance to replicate the source tables `original_table_1` and `original_table_2` to the target tables `replica_table_1` and `replica_table_2`:
55+
56+
```yql
57+
CREATE ASYNC REPLICATION my_replication_for_multiple_tables
58+
FOR original_table_1 AS replica_table_1, original_table_2 AS replica_table_2
59+
WITH (
60+
CONNECTION_STRING = 'grpcs://example.com:2135/?database=/Root/another_database',
61+
TOKEN_SECRET_NAME = 'my_secret'
62+
);
63+
```
64+
65+
The following statement creates an asynchronous replication instance for the objects in the `original_dir` directory:
66+
67+
```yql
68+
CREATE ASYNC REPLICATION my_replication_for_dir
69+
FOR original_dir AS replica_dir
70+
WITH (
71+
CONNECTION_STRING = 'grpcs://example.com:2135/?database=/Root/another_database',
72+
TOKEN_SECRET_NAME = 'my_secret'
73+
);
74+
```
75+
76+
The following statement creates an asynchronous replication instance for the objects in the `/Root/another_database` database:
77+
78+
```yql
79+
CREATE ASYNC REPLICATION my_replication_for_database
80+
FOR `/Root/another_database` AS `/Root/my_database`
81+
WITH (
82+
CONNECTION_STRING = 'grpcs://example.com:2135/?database=/Root/another_database',
83+
TOKEN_SECRET_NAME = 'my_secret'
84+
);
85+
```
86+
87+
## See also
88+
89+
* [ALTER ASYNC REPLICATION](alter-async-replication.md)
90+
* [DROP ASYNC REPLICATION](drop-async-replication.md)
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# CREATE OBJECT (TYPE SECRET)
2+
3+
{% note warning %}
4+
5+
The syntax for managing secrets will change in future {{ ydb-full-name }} releases.
6+
7+
{% endnote %}
8+
9+
The `CREATE OBJECT (TYPE SECRET)` statement creates a [secret](../../../concepts/datamodel/secrets.md).
10+
11+
## Syntax {#syntax}
12+
13+
```yql
14+
CREATE OBJECT <secret_name> (TYPE SECRET) WITH value="<secret_value>";
15+
```
16+
17+
### Parameters
18+
19+
* `secret_name` - the name of the secret.
20+
* `secret_value` - the contents of the secret.
21+
22+
## Example {#examples}
23+
24+
The following statement creates a secret named `MySecretName` with `MySecretData` as a value.
25+
26+
```yql
27+
CREATE OBJECT MySecretName (TYPE SECRET) WITH value="MySecretData";
28+
```
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# DROP ASYNC REPLICATION
2+
3+
The `DROP ASYNC REPLICATION` statement deletes an [asynchronous replication](../../../concepts/async-replication.md) instance. When an asynchronous replication instance is [deleted](../../../concepts/async-replication.md#drop), the following objects are also deleted:
4+
5+
* automatically created [streams of changes](../../../concepts/glossary.md#changefeed)
6+
* [replicas](../../../concepts/glossary.md#replica-object) (optionally)
7+
8+
## Syntax {#syntax}
9+
10+
```yql
11+
DROP ASYNC REPLICATION <name> [CASCADE]
12+
```
13+
14+
### Parameters
15+
16+
* `name` — the name of the asynchronous replication instance.
17+
* `CASCADE` — cascaded deletion of the replicas that were created for a given asynchronous replication instance.
18+
19+
## Examples {#examples}
20+
21+
This section contains examples of YQL statements that drop the asynchronous replication instance created with the following expression:
22+
23+
```yql
24+
CREATE ASYNC REPLICATION my_replication
25+
FOR original_table AS replica_table
26+
WITH (
27+
CONNECTION_STRING = 'grpcs://example.com:2135/?database=/Root/another_database',
28+
TOKEN_SECRET_NAME = 'my_secret'
29+
);
30+
```
31+
32+
The following statement drops an asynchronous replication instance and the automatically created stream of changes for the `original_table` table, but the `replica_table` table is not deleted:
33+
34+
```yql
35+
DROP ASYNC REPLICATION my_replication;
36+
```
37+
38+
The following statement drops an asynchronous replication instance, the automatically created stream of changes for the `original_table` table, and the `replica_table` table:
39+
40+
```yql
41+
DROP ASYNC REPLICATION my_replication CASCADE;
42+
```
43+
44+
## See also
45+
46+
* [CREATE ASYNC REPLICATION](create-async-replication.md)
47+
* [ALTER ASYNC REPLICATION](alter-async-replication.md)

ydb/docs/en/core/yql/reference/yql-core/syntax/index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,3 +97,11 @@
9797

9898
{% endif %}
9999

100+
{% if feature_async_replication %}
101+
102+
* [CREATE ASYNC REPLICATION](create-async-replication.md)
103+
* [ALTER ASYNC REPLICATION](alter-async-replication.md)
104+
* [DROP ASYNC REPLICATION](drop-async-replication.md)
105+
106+
{% endif %}
107+

0 commit comments

Comments
 (0)