Skip to content

Commit c399067

Browse files
DOC-5282 started restructuring job file docs
1 parent 5c61408 commit c399067

File tree

3 files changed

+137
-128
lines changed

3 files changed

+137
-128
lines changed

content/integrate/redis-data-integration/data-pipelines/data-pipelines.md

Lines changed: 6 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -281,129 +281,12 @@ sudo service k3s restart
281281

282282
## Job files
283283

284-
You can optionally supply one or more job files that specify how you want to
285-
transform the captured data before writing it to the target.
286-
Each job file contains a YAML
287-
configuration that controls the transformation for a particular table from the source
288-
database. You can also add a `default-job.yaml` file to provide
289-
a default transformation for tables that don't have a specific job file of their own.
290-
291-
The job files have a structure like the following example. This configures a default
292-
job that:
293-
294-
- Writes the data to a Redis hash
295-
- Adds a field `app_code` to the hash with a value of `foo`
296-
- Adds a prefix of `aws` and a suffix of `gcp` to the key
297-
298-
```yaml
299-
source:
300-
table: "*"
301-
row_format: full
302-
transform:
303-
- uses: add_field
304-
with:
305-
fields:
306-
- field: after.app_code
307-
expression: "`foo`"
308-
language: jmespath
309-
output:
310-
- uses: redis.write
311-
with:
312-
data_type: hash
313-
key:
314-
expression: concat(['aws', '#', table, '#', keys(key)[0], '#', values(key)[0], '#gcp'])
315-
language: jmespath
316-
```
317-
318-
The main sections of these files are:
319-
320-
- `source`: This is a mandatory section that specifies the data items that you want to
321-
use. You can add the following properties here:
322-
- `server_name`: Logical server name (optional).
323-
- `db`: Database name (optional)
324-
- `schema`: Database schema (optional)
325-
- `table`: Database table name. This refers to a table name you supplied in `config.yaml`. The default
326-
job doesn't apply to a specific table, so use "*" in place of the table name for this job only.
327-
- `row_format`: Format of the data to be transformed. This can take the values `data_only` (default) to
328-
use only the payload data, or `full` to use the complete change record. See the `transform` section below
329-
for details of the extra data you can access when you use the `full` option.
330-
- `case_insensitive`: This applies to the `server_name`, `db`, `schema`, and `table` properties
331-
and is set to `true` by default. Set it to `false` if you need to use case-sensitive values for these
332-
properties.
333-
334-
- `transform`: This is an optional section describing the transformation that the pipeline
335-
applies to the data before writing it to the target. The `uses` property specifies a
336-
*transformation block* that will use the parameters supplied in the `with` section. See the
337-
[data transformation reference]({{< relref "/integrate/redis-data-integration/reference/data-transformation" >}})
338-
for more details about the supported transformation blocks, and also the
339-
[JMESPath custom functions]({{< relref "/integrate/redis-data-integration/reference/jmespath-custom-functions" >}}) reference. You can test your transformation logic using the [dry run]({{< relref "/integrate/redis-data-integration/reference/api-reference/#tag/secure/operation/job_dry_run_api_v1_pipelines_jobs_dry_run_post" >}}) feature in the API.
340-
341-
{{< note >}}If you set `row_format` to `full` under the `source` settings, you can access extra data from the
342-
change record in the transformation:
343-
- Use the `key` object to access the attributes of the key. For example, `key.id` will give you the value of the `id` column as long as it is part of the primary key.
344-
- Use `before.<FIELD_NAME>` to get the value of a field *before* it was updated in the source database
345-
- Use `after.<FIELD_NAME>` to get the value of a field *after* it was updated in the source database
346-
- Use `after.<FIELD_NAME>` when adding new fields during transformations
347-
348-
See [Row Format]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples/redis-row-format#full" >}}) for a more detailed explanation of the full format.
349-
{{< /note >}}
350-
351-
- `output`: This is a mandatory section to specify the data structure(s) that
352-
RDI will write to
353-
the target along with the text pattern for the key(s) that will access it.
354-
Note that you can map one record to more than one key in Redis or nest
355-
a record as a field of a JSON structure (see
356-
[Data denormalization]({{< relref "/integrate/redis-data-integration/data-pipelines/data-denormalization" >}})
357-
for more information about nesting). You can add the following properties in the `output` section:
358-
- `uses`: This must have the value `redis.write` to specify writing to a Redis data
359-
structure. You can add more than one block of this type in the same job.
360-
- `with`:
361-
- `connection`: Connection name as defined in `config.yaml` (by default, the connection named `target` is used).
362-
- `data_type`: Target data structure when writing data to Redis. The supported types are `hash`, `json`, `set`,
363-
`sorted_set`, `stream` and `string`.
364-
- `key`: This lets you override the default key for the data structure with custom logic:
365-
- `expression`: Expression to generate the key.
366-
- `language`: Expression language, which must be `jmespath` or `sql`.
367-
- `expire`: Positive integer value indicating a number of seconds for the key to expire.
368-
If you don't specify this property, the key will never expire.
369-
370-
{{< note >}}In a job file, the `transform` section is optional, but if you don't specify
371-
a `transform`, you must specify custom key logic in `output.with.key`. You can include
372-
both of these sections if you want both a custom transform and a custom key.{{< /note >}}
373-
374-
Another example below shows how you can rename the `fname` field to `first_name` in the table `emp`
375-
using the
376-
[`rename_field`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/rename_field" >}}) block. It also demonstrates how you can set the key of this record instead of relying on
377-
the default logic. (See the
378-
[Transformation examples]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples" >}})
379-
section for more examples of job files.)
380-
381-
```yaml
382-
source:
383-
server_name: redislabs
384-
schema: dbo
385-
table: emp
386-
transform:
387-
- uses: rename_field
388-
with:
389-
from_field: fname
390-
to_field: first_name
391-
output:
392-
- uses: redis.write
393-
with:
394-
connection: target
395-
key:
396-
expression: concat(['emp:fname:',fname,':lname:',lname])
397-
language: jmespath
398-
```
399-
400-
See the
401-
[RDI configuration file]({{< relref "/integrate/redis-data-integration/reference/config-yaml-reference" >}})
402-
reference for full details about the
403-
available source, transform, and target configuration options and see
404-
also the
405-
[data transformation reference]({{< relref "/integrate/redis-data-integration/reference/data-transformation" >}})
406-
for details of all the available transformation blocks.
284+
You can use one or more job files to configure which fields from the source tables
285+
you want to use, and which data structure you want to write to the target. You
286+
can also optionally specify a transformation to apply to the data before writing it
287+
to the target. See the
288+
[Job files]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples" >}})
289+
section for full details of the file format and examples of common tasks for job files.
407290

408291
## Source preparation
409292

content/integrate/redis-data-integration/data-pipelines/deploy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ linkTitle: Deploy
1313
summary: Redis Data Integration keeps Redis in sync with the primary database in near
1414
real time.
1515
type: integration
16-
weight: 2
16+
weight: 10
1717
---
1818

1919
The sections below explain how to deploy a pipeline after you have created the required
Lines changed: 130 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,144 @@
11
---
2-
Title: Transformation examples
2+
Title: Job files
33
aliases: /integrate/redis-data-integration/ingest/data-pipelines/transform-examples/
44
alwaysopen: false
55
categories:
66
- docs
77
- integrate
88
- rs
99
- rdi
10-
description: Explore some examples of common RDI transformations
10+
description: Learn how to configure job files for data transformation.
1111
group: di
1212
hideListLinks: false
13-
linkTitle: Transformation examples
13+
linkTitle: Job files
1414
summary: Redis Data Integration keeps Redis in sync with the primary database in near
1515
real time.
1616
type: integration
17-
weight: 30
17+
weight: 5
1818
---
19+
20+
You can optionally supply one or more job files that specify how you want to
21+
transform the captured data before writing it to the target.
22+
Each job file contains a YAML
23+
configuration that controls the transformation for a particular table from the source
24+
database. You can also add a `default-job.yaml` file to provide
25+
a default transformation for tables that don't have a specific job file of their own.
26+
27+
The job files have a structure like the following example. This configures a default
28+
job that:
29+
30+
- Writes the data to a Redis hash
31+
- Adds a field `app_code` to the hash with a value of `foo`
32+
- Adds a prefix of `aws` and a suffix of `gcp` to the key
33+
34+
```yaml
35+
source:
36+
table: "*"
37+
row_format: full
38+
transform:
39+
- uses: add_field
40+
with:
41+
fields:
42+
- field: after.app_code
43+
expression: "`foo`"
44+
language: jmespath
45+
output:
46+
- uses: redis.write
47+
with:
48+
data_type: hash
49+
key:
50+
expression: concat(['aws', '#', table, '#', keys(key)[0], '#', values(key)[0], '#gcp'])
51+
language: jmespath
52+
```
53+
54+
The main sections of these files are:
55+
56+
- `source`: This is a mandatory section that specifies the data items that you want to
57+
use. You can add the following properties here:
58+
- `server_name`: Logical server name (optional).
59+
- `db`: Database name (optional)
60+
- `schema`: Database schema (optional)
61+
- `table`: Database table name. This refers to a table name you supplied in `config.yaml`. The default
62+
job doesn't apply to a specific table, so use "*" in place of the table name for this job only.
63+
- `row_format`: Format of the data to be transformed. This can take the values `data_only` (default) to
64+
use only the payload data, or `full` to use the complete change record. See the `transform` section below
65+
for details of the extra data you can access when you use the `full` option.
66+
- `case_insensitive`: This applies to the `server_name`, `db`, `schema`, and `table` properties
67+
and is set to `true` by default. Set it to `false` if you need to use case-sensitive values for these
68+
properties.
69+
70+
- `transform`: This is an optional section describing the transformation that the pipeline
71+
applies to the data before writing it to the target. The `uses` property specifies a
72+
*transformation block* that will use the parameters supplied in the `with` section. See the
73+
[data transformation reference]({{< relref "/integrate/redis-data-integration/reference/data-transformation" >}})
74+
for more details about the supported transformation blocks, and also the
75+
[JMESPath custom functions]({{< relref "/integrate/redis-data-integration/reference/jmespath-custom-functions" >}}) reference. You can test your transformation logic using the [dry run]({{< relref "/integrate/redis-data-integration/reference/api-reference/#tag/secure/operation/job_dry_run_api_v1_pipelines_jobs_dry_run_post" >}}) feature in the API.
76+
77+
{{< note >}}If you set `row_format` to `full` under the `source` settings, you can access extra data from the
78+
change record in the transformation:
79+
- Use the `key` object to access the attributes of the key. For example, `key.id` will give you the value of the `id` column as long as it is part of the primary key.
80+
- Use `before.<FIELD_NAME>` to get the value of a field *before* it was updated in the source database
81+
- Use `after.<FIELD_NAME>` to get the value of a field *after* it was updated in the source database
82+
- Use `after.<FIELD_NAME>` when adding new fields during transformations
83+
84+
See [Row Format]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples/redis-row-format#full" >}}) for a more detailed explanation of the full format.
85+
{{< /note >}}
86+
87+
- `output`: This is a mandatory section to specify the data structure(s) that
88+
RDI will write to
89+
the target along with the text pattern for the key(s) that will access it.
90+
Note that you can map one record to more than one key in Redis or nest
91+
a record as a field of a JSON structure (see
92+
[Data denormalization]({{< relref "/integrate/redis-data-integration/data-pipelines/data-denormalization" >}})
93+
for more information about nesting). You can add the following properties in the `output` section:
94+
- `uses`: This must have the value `redis.write` to specify writing to a Redis data
95+
structure. You can add more than one block of this type in the same job.
96+
- `with`:
97+
- `connection`: Connection name as defined in `config.yaml` (by default, the connection named `target` is used).
98+
- `data_type`: Target data structure when writing data to Redis. The supported types are `hash`, `json`, `set`,
99+
`sorted_set`, `stream` and `string`.
100+
- `key`: This lets you override the default key for the data structure with custom logic:
101+
- `expression`: Expression to generate the key.
102+
- `language`: Expression language, which must be `jmespath` or `sql`.
103+
- `expire`: Positive integer value indicating a number of seconds for the key to expire.
104+
If you don't specify this property, the key will never expire.
105+
106+
{{< note >}}In a job file, the `transform` section is optional, but if you don't specify
107+
a `transform`, you must specify custom key logic in `output.with.key`. You can include
108+
both of these sections if you want both a custom transform and a custom key.{{< /note >}}
109+
110+
Another example below shows how you can rename the `fname` field to `first_name` in the table `emp`
111+
using the
112+
[`rename_field`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/rename_field" >}}) block. It also demonstrates how you can set the key of this record instead of relying on
113+
the default logic.
114+
115+
```yaml
116+
source:
117+
server_name: redislabs
118+
schema: dbo
119+
table: emp
120+
transform:
121+
- uses: rename_field
122+
with:
123+
from_field: fname
124+
to_field: first_name
125+
output:
126+
- uses: redis.write
127+
with:
128+
connection: target
129+
key:
130+
expression: concat(['emp:fname:',fname,':lname:',lname])
131+
language: jmespath
132+
```
133+
134+
See the
135+
[RDI configuration file]({{< relref "/integrate/redis-data-integration/reference/config-yaml-reference" >}})
136+
reference for full details about the
137+
available source, transform, and target configuration options and see
138+
also the
139+
[data transformation reference]({{< relref "/integrate/redis-data-integration/reference/data-transformation" >}})
140+
for details of all the available transformation blocks.
141+
142+
## Examples
143+
144+
The pages listed below show examples of typical job files for different use cases.

0 commit comments

Comments
 (0)