Skip to content

Commit 7e569ef

Browse files
Merge pull request #1554 from stoyanr/RDSC-3533-update-rdi-docs-2
RDSC-3533 Update RDI documentation, part 2
2 parents ac84656 + 9ba0156 commit 7e569ef

File tree

3 files changed

+155
-149
lines changed

3 files changed

+155
-149
lines changed

content/integrate/redis-data-integration/architecture.md

Lines changed: 37 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -93,23 +93,29 @@ RDI supports the following database sources using [Debezium Server](https://debe
9393

9494
## How RDI is deployed
9595

96-
RDI is designed with two *planes* that provide its services.
96+
RDI is designed with three *planes* that provide its services.
97+
9798
The *control plane* contains the processes that keep RDI active.
9899
It includes:
99100

100-
- An *operator* process that schedules the CDC collector and the
101-
stream processor to implement the two phases of the pipeline
102-
lifecycle (initial cache loading and change streaming)
103-
- A [Prometheus](https://prometheus.io/)
104-
endpoint to supply metrics about RDI
105-
- A REST API to control the VM.
101+
- An *API server* process that exposes a REST API to observe and control RDI.
102+
- An *operator* process that manages the *data plane* processes.
103+
- A *metrics exporter* process that reads metrics from the RDI database
104+
and exports them as [Prometheus](https://prometheus.io/) metrics.
105+
106+
The *data plane* contains the processes that actually move the data.
107+
It includes the *CDC collector* and the *stream processor* that implement
108+
the two phases of the pipeline lifecycle (initial cache loading and change streaming).
106109

107110
The *management plane* provides tools that let you interact
108-
with the control plane. Use the CLI tool to install and administer RDI
109-
and to deploy and manage a pipeline. Use the pipeline editor
110-
(included in Redis Insight) to design or edit a pipeline. The
111-
diagram below shows the components of the control and management
112-
planes and the connections between them:
111+
with the control plane.
112+
113+
- Use the CLI tool to install and administer RDI and to deploy
114+
and manage a pipeline.
115+
- Use the pipeline editor included in Redis Insight to design
116+
or edit a pipeline.
117+
118+
The diagram below shows all RDI components and the interactions between them:
113119

114120
{{< image filename="images/rdi/ingest/ingest-control-plane.webp" >}}
115121

@@ -118,11 +124,11 @@ deploy RDI.
118124

119125
### RDI on your own VMs
120126

121-
For this deployment, you must provide two VMs. The
122-
collector and stream processor are active on one VM while the other is a standby to provide high availability. The operators run on both VMs and use an algorithm to decide which is the active one (the "leader").
123-
Both the active VM and the standby
124-
need access to the authentication secrets that RDI uses to encrypt network
125-
traffic. The diagram below shows this configuration:
127+
For this deployment, you must provide two VMs. The collector and stream processor
128+
are active on one VM, while on the other they are in standby to provide high availability.
129+
The two operators running on both VMs use a leader election algorithm to decide which
130+
VM is the active one (the "leader").
131+
The diagram below shows this configuration:
126132

127133
{{< image filename="images/rdi/ingest/ingest-active-passive-vms.webp" >}}
128134

@@ -136,27 +142,26 @@ on [Kubernetes (K8s)](https://kubernetes.io/), including Red Hat
136142
[OpenShift](https://docs.openshift.com/). This creates:
137143

138144
- A K8s [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) named `rdi`.
139-
- [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) for the
145+
You can also use a different namespace name if you prefer.
146+
- [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) and
147+
[services](https://kubernetes.io/docs/concepts/services-networking/service/) for the
140148
[RDI operator]({{< relref "/integrate/redis-data-integration/architecture#how-rdi-is-deployed" >}}),
141149
[metrics exporter]({{< relref "/integrate/redis-data-integration/observability" >}}), and API server.
142-
- A [service account](https://kubernetes.io/docs/concepts/security/service-accounts/) along with a
143-
[role](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#restrictions-on-role-creation-or-update)
144-
and [role binding](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-and-clusterrolebinding) for the RDI operator.
145-
- A [Configmap](https://kubernetes.io/docs/concepts/configuration/configmap/)
146-
for the different components with RDI Redis database details.
150+
- A [service account](https://kubernetes.io/docs/concepts/security/service-accounts/)
151+
and [RBAC resources](https://kubernetes.io/docs/reference/access-authn-authz/rbac) for the RDI operator.
152+
- A [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) with RDI database details.
147153
- [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/)
148-
with the RDI Redis database credentials and TLS certificates.
154+
with the RDI database credentials and TLS certificates.
155+
- Other optional K8s resources such as [ingresses](https://kubernetes.io/docs/concepts/services-networking/ingress/)
156+
that can be enabled depending on your K8s environment and needs.
149157

150158
See [Install on Kubernetes]({{< relref "/integrate/redis-data-integration/installation/install-k8s" >}})
151159
for more information.
152160

153161
### Secrets and security considerations
154162

155-
RDI encrypts all network connections with
156-
[TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) or
157-
[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS).
158-
The credentials for the connections are saved as secrets and you
159-
can choose how to provide these secrets to RDI. Note that RDI stores
160-
all state and configuration data inside the Redis Enterprise cluster
161-
and does not store any other data on your RDI VMs or anywhere else
162-
outside the cluster.
163+
The credentials for the database connections, as well as the certificates
164+
for [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) and
165+
[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) are saved in K8s secrets.
166+
RDI stores all state and configuration data inside the Redis Enterprise cluster
167+
and does not store any other data on your RDI VMs or anywhere else outside the cluster.

content/integrate/redis-data-integration/data-pipelines/data-pipelines.md

Lines changed: 47 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -69,11 +69,10 @@ The sections below describe the two types of configuration file in more detail.
6969
## The `config.yaml` file
7070

7171
Here is an example of a `config.yaml` file. Note that the values of the
72-
form "`${name}`" refer to environment variables that you should set with the
73-
[`redis-di set-secret`]({{< relref "/integrate/redis-data-integration/reference/cli/redis-di-set-secret" >}})
74-
command. In particular, you should normally use environment variables as shown to set the source
75-
and target username and password rather than storing them in plain text in this
76-
file (see [Set secrets]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy#set-secrets" >}}) for more information).
72+
form "`${name}`" refer to secrets that you should set as described in
73+
[Set secrets]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy#set-secrets" >}}).
74+
In particular, you should normally use secrets as shown to set the source
75+
and target username and password rather than storing them in plain text in this file.
7776

7877
```yaml
7978
sources:
@@ -212,30 +211,43 @@ to identify the source (in the example we have a source
212211
called `mysql` but you can choose any name you like). The example
213212
configuration contains the following data:
214213

215-
- `type`: The type of collector to use for the pipeline. Currently, the only type we support is `cdc`.
216-
- `connection`: The connection details for the source database: hostname, port, schema/ db name, database credentials and
217-
[TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security)/
218-
[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) secrets.
219-
- `tables`: The dataset you want to collect from the source. This subsection
220-
specifies:
221-
- `snapshot_sql`: A query that selects the tables to include in the dataset
222-
(the default is to include all tables if you don't specify a query here).
214+
- `type`: The type of collector to use for the pipeline.
215+
Currently, the only types we support are `cdc` and `external`.
216+
If the source type is set to `external`, no collector resources will be created by the operator,
217+
and all other source sections should be empty or not specified at all.
218+
- `connection`: The connection details for the source database: `type`, `host`, `port`,
219+
and credentials (`username` and `password`).
220+
- `type` is the source database type, one of `mariadb`, `mysql`, `oracle`, `postgresql`, or `sqlserver`.
221+
- If you use [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security)/
222+
or [mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) to connect
223+
to the source database, you may need to specify additional properties in the
224+
`advanced` section with references to the corresponding certificates depending
225+
on the source database type. Note that these properties **must** be references to
226+
secrets that you should set as described in [Set secrets]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy#set-secrets" >}}).
227+
- `databases`: List of all databases to collect data from for source database types
228+
that support multiple databases, such as `mysql` and `mariadb`.
229+
- `schemas`: List of all schemas to collect data from for source database types
230+
that support multiple schemas, such as `oracle`, `postgresql`, and `sqlserver`.
231+
- `tables`: List of all tables to collect data from. Each table is identified by its
232+
full name, including a database or schema prefix. If there is a single
233+
database or schema, this prefix can be omitted.
234+
For each table, you can specify:
223235
- `columns`: A list of the columns you are interested in (the default is to
224-
include all columns if you don't supply a list)
236+
include all columns)
225237
- `keys`: A list of columns to create a composite key if your table
226238
doesn't already have a [`PRIMARY KEY`](https://www.w3schools.com/sql/sql_primarykey.asp) or
227239
[`UNIQUE`](https://www.w3schools.com/sql/sql_unique.asp) constraint.
240+
- `snapshot_sql`: A query to be used when performing the initial snapshot.
241+
By default, a query that contains all listed columns of all listed tables will be used.
228242
- `advanced`: These optional properties configure other Debezium-specific features.
229243
The available sub-sections are:
230-
- `sink`: All advanced properties for writing to RDI (TLS, memory threshold, etc).
244+
- `source`: Properties for reading from the source database.
245+
See the Debezium [Source connectors](https://debezium.io/documentation/reference/stable/connectors/)
246+
pages for more information about the properties available for each database type.
247+
- `sink`: Properties for writing to Redis streams in the RDI database.
231248
See the Debezium [Redis stream properties](https://debezium.io/documentation/reference/stable/operations/debezium-server.html#_redis_stream)
232249
page for the full set of available properties.
233-
- `source`: All advanced connector properties (for example, RAC nodes).
234-
See [Database-specific connection properties](#db-connect-props) below and also
235-
see the
236-
Debezium [Connectors](https://debezium.io/documentation/reference/stable/connectors/)
237-
pages for more information about the properties available for each database type.
238-
- `quarkus`: All advanced properties for Debezium server, such as the log level. See the
250+
- `quarkus`: Properties for the Debezium server, such as the log level. See the
239251
Quarkus [Configuration options](https://quarkus.io/guides/all-config)
240252
docs for the full set of available properties.
241253

@@ -244,10 +256,16 @@ configuration contains the following data:
244256
Use this section to provide the connection details for the target Redis
245257
database(s). As with the sources, you should start each target section
246258
with a unique name that you are free to choose (here, we have used
247-
`my-redis` as an example). In the `connection` section, you can supply the
248-
`type` of target database, which will generally be `redis` along with the
249-
`host` and `port` of the server. You can also supply connection credentials
250-
and TLS/mTLS secrets here if you use them.
259+
`target` as an example). In the `connection` section, you can specify the
260+
`type` of the target database, which must be `redis`, along with
261+
connection details such as `host`, `port`, and credentials (`username` and `password`).
262+
If you use [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security)/
263+
or [mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) to connect
264+
to the target database, you must specify the CA certificate (for TLS),
265+
and the client certificate and private key (for mTLS) in `cacert`, `cert`, and `key`.
266+
Note that these certificates **must** be references to secrets
267+
that you should set as described in [Set secrets]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy#set-secrets" >}})
268+
(it is not possible to include these certificates as plain text in the file).
251269

252270
{{< note >}}If you specify `localhost` as the address of either the source or target server during
253271
installation then the connection will fail if the actual IP address changes for the local
@@ -400,10 +418,9 @@ When your configuration is ready, you must deploy it to start using the pipeline
400418
[Deploy a pipeline]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy" >}})
401419
to learn how to do this.
402420

403-
## Ingest pipeline lifecycle
421+
## Pipeline lifecycle
404422

405-
Once you have created the configuration for a pipeline, it goes through the
406-
following phases:
423+
A pipeline goes through the following phases:
407424

408425
1. *Deploy* - when you deploy the pipeline, RDI first validates it before use.
409426
Then, the [operator]({{< relref "/integrate/redis-data-integration/architecture#how-rdi-is-deployed">}}) creates and configures the collector and stream processor that will run the pipeline.
@@ -415,8 +432,8 @@ hours to complete if you have a lot of data.
415432
the source data. Whenever a change is committed to the source, the collector captures
416433
it and adds it to the target through the pipeline. This phase continues indefinitely
417434
unless you change the pipeline configuration.
418-
1. *Update* - If you update the pipeline configuration, the operator starts applying it
419-
to the processor and the collector. Note that the changes only affect newly-captured
435+
1. *Update* - If you update the pipeline configuration, the operator applies it
436+
to the collector and the stream processor. Note that the changes only affect newly-captured
420437
data unless you reset the pipeline completely. Once RDI has accepted the updates, the
421438
pipeline returns to the CDC phase with the new configuration.
422439
1. *Reset* - There are circumstances where you might want to rebuild the dataset

0 commit comments

Comments
 (0)