Skip to content

Commit 92823ca

Browse files
DOC-5282 small changes suggested by AI
1 parent 75ac236 commit 92823ca

File tree

1 file changed

+11
-10
lines changed
  • content/integrate/redis-data-integration/data-pipelines

1 file changed

+11
-10
lines changed

content/integrate/redis-data-integration/data-pipelines/_index.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,12 @@ type: integration
1919
weight: 30
2020
---
2121

22-
RDI implements
23-
[change data capture](https://en.wikipedia.org/wiki/Change_data_capture) (CDC)
24-
with *pipelines*. (See the
22+
RDI uses *pipelines* to implement
23+
[change data capture](https://en.wikipedia.org/wiki/Change_data_capture) (CDC). (See the
2524
[architecture overview]({{< relref "/integrate/redis-data-integration/architecture#overview" >}})
2625
for an introduction to pipelines.)
26+
The sections below explain how pipelines work and give an overview of how to configure and
27+
deploy them.
2728

2829
## How a pipeline works
2930

@@ -39,26 +40,26 @@ However, you can also provide your own custom transformation [jobs](#job-files)
3940
for each source table, using your own data mapping and key pattern. You specify these
4041
jobs declaratively with YAML configuration files that require no coding.
4142

42-
The data tranformation involves two separate stages:
43+
Data transformation involves two stages:
4344

4445
1. The data ingested during CDC is automatically transformed to an intermediate JSON
4546
change event format.
46-
1. This JSON change event data gets passed on to your custom transformation for further
47+
1. RDI passes this JSON change event data to your custom transformation for further
4748
processing.
4849

4950
The diagram below shows the flow of data through the pipeline:
5051

5152
{{< image filename="/images/rdi/ingest/RDIPipeDataflow.webp" >}}
5253

53-
You can provide a job file for each source table for which you want to specify a custom
54+
You can provide a job file for each source table that needs a custom
5455
transformation. You can also add a *default job file* for any tables that don't have their own.
5556
You must specify the full name of the source table in the job file (or the special
5657
name "*" in the default job) and you
5758
can also include filtering logic to skip data that matches a particular condition.
5859
As part of the transformation, you can specify any of the following data types
5960
to store the data in Redis:
6061

61-
- [JSON objects]({{< relref "/develop/data-types/json" >}})
62+
- [JSON]({{< relref "/develop/data-types/json" >}})
6263
- [Hashes]({{< relref "/develop/data-types/hashes" >}})
6364
- [Sets]({{< relref "/develop/data-types/sets" >}})
6465
- [Streams]({{< relref "/develop/data-types/streams" >}})
@@ -73,8 +74,8 @@ After you deploy a pipeline, it goes through the following phases:
7374
Then, the [operator]({{< relref "/integrate/redis-data-integration/architecture#how-rdi-is-deployed">}}) creates and configures the collector and stream processor that will run the pipeline.
7475
1. *Snapshot* - The collector starts the pipeline by creating a snapshot of the full
7576
dataset. This involves reading all the relevant source data, transforming it and then
76-
writing it into the Redis target. You should expect this phase to take minutes or
77-
hours to complete if you have a lot of data.
77+
writing it into the Redis target. This phase typically takes minutes to
78+
hours if you have a lot of data.
7879
1. *CDC* - Once the snapshot is complete, the collector starts listening for updates to
7980
the source data. Whenever a change is committed to the source, the collector captures
8081
it and adds it to the target through the pipeline. This phase continues indefinitely
@@ -115,7 +116,7 @@ structure of the configuration:
115116
The main configuration for the pipeline is in the `config.yaml` file.
116117
This specifies the connection details for the source database (such
117118
as host, username, and password) and also the queries that RDI will use
118-
to extract the required data. You should place job configurations in the `Jobs`
119+
to extract the required data. You should place job files in the `Jobs`
119120
folder if you want to specify your own data transformations.
120121

121122
See

0 commit comments

Comments
 (0)