Skip to content

Commit 9b2206d

Browse files
DOC-4549 rework custom snapshot SQL section
1 parent a9a9cf4 commit 9b2206d

File tree

1 file changed

+95
-77
lines changed

1 file changed

+95
-77
lines changed

content/integrate/redis-data-integration/reference/config-yaml-reference.md

Lines changed: 95 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,50 @@ categories: ["redis-di"]
88
aliases: /integrate/redis-data-integration/ingest/reference/config-yaml-reference/
99
---
1010

11+
This document describes the options RDI's `config.yaml` file in detail. See
12+
[Configure data pipelines]({{< relref "/integrate/redis-data-integration/data-pipelines/data-pipelines" >}})
13+
for more information about the role `config.yaml` plays in defining a pipeline.
14+
15+
## Note about fully-qualified table names
16+
17+
Throughout this document we use the format `<databaseName>.<tableName>` to refer to a fully-qualified table name. This format is actually the one used by MySQL, but for Oracle,
18+
SQLServer, and PostgreSQL, you should use `<schemaName>`.`<tableName>` instead.
19+
20+
{{< note >}}You can specify the fully-qualified table name `<databaseName>.<tableName>` as
21+
a regular expression instead of providing the full name of the `databaseName` and `tableName`.
22+
{{< /note >}}
23+
24+
The example below shows the MySQL format specifying the desired columns and primary keys
25+
for the `chinook.customer` and `chinook.employee` tables:
26+
27+
```yaml
28+
tables:
29+
# Sync a specific table with all its columns:
30+
chinook.customer:
31+
columns:
32+
- ID
33+
- FirstName
34+
- LastName
35+
- Company
36+
- Address
37+
- Email
38+
keys:
39+
- FirstName
40+
- LastName
41+
chinook.employee:
42+
columns:
43+
- ID
44+
- FirstName
45+
- LastName
46+
- ReportsTo
47+
- Address
48+
- City
49+
- State
50+
keys:
51+
- FirstName
52+
- LastName
53+
```
54+
1155
## Top level objects
1256
1357
These objects define the sections at the root level of `config.yaml`.
@@ -74,62 +118,81 @@ See the Debezium documentation for more information about the specific connector
74118
| `schema.exclude.list` | `string` | Oracle, PostgreSQL, SQLServer | An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. |
75119
| `table.exclude.list` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. |
76120
| `column.exclude.list` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not specify the `columns` block in the configuration if you are using the `column.exclude.list` property to filter out columns. |
77-
| `snapshot.select.statement.overrides` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer |Specifies the table rows to include in a snapshot. Use this property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log. |
121+
| `snapshot.select.statement.overrides` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer |Specifies the table rows to include in a snapshot. Use this property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log. See [Using custom queries in the initial snapshot](#custom-initial-query) below for more information. |
78122
| `log.enabled` | `string` | Oracle | Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.<br/>Default: `false` |
79123
| `unavailable.value.placeholder` | Special | Oracle | Specifies the constant that the connector provides to indicate that the original value is unchanged and not provided by the database (this has the type `__debezium_unavailable_value`). |
80124

81-
### Using queries in the initial snapshot (relevant for MySQL, Oracle, PostgreSQL and SQLServer)
125+
### Using custom queries in the initial snapshot {#custom-initial-query}
82126

83-
- In case you want a snapshot to include only a subset of the rows in a table, you need to add the property `snapshot.select.statement.overrides` and add a comma-separated list of [fully-qualified table names](#fully-qualified-table-name). The list should include every table for which you want to add a SELECT statement.
127+
{{< note >}}This section is relevant only for MySQL, Oracle, PostgreSQL, and SQLServer.
128+
{{< /note >}}
84129

85-
- **For each table in the list above, add a further configuration property** that specifies the `SELECT` statement for the connector to run on the table when it takes a snapshot.
130+
By default, the initial snapshot captures all rows from each table.
131+
If you want the snapshot to include only a subset of the rows in a table, you can use a
132+
custom `SELECT` statement to override the default and select only the rows you are interested in.
133+
To do this, you must first specify the tables whose `SELECT` statement you want to override by adding a `snapshot.select.statement.overrides` in the `source` section with a comma-separated list of [fully-qualified table names](#fully-qualified-table-name).
86134

87-
The specified `SELECT` statement determines the subset of table rows to include in the snapshot.
135+
After the `snapshot.select.statement.overrides` list, you must then add another configuration property for each table in the list to specify the custom `SELECT` statement for that table.
136+
The format of the property name depends on the database you are using:
88137

89-
Use the following format to specify the name of this `SELECT` statement property:
138+
- For Oracle, SQLServer, and PostrgreSQL, use `snapshot.select.statement.overrides.<SCHEMA_NAME>.<TABLE_NAME>`
139+
- For MySQL, use: `snapshot.select.statement.overrides<DATABASE_NAME>.<TABLE_NAME>`
90140

91-
- Oracle, SQLServer, PostrgreSQL: `snapshot.select.statement.overrides: <SCHEMA_NAME>.<TABLE_NAME>`
92-
- MySQL: `snapshot.select.statement.overrides: <DATABASE_NAME>.<TABLE_NAME>`
141+
For example, with PostgreSQL, you would have a configuration like the following:
93142

94-
- Add the list of columns you want to include in the `SELECT` statement using fully-qualified names. Each column should be specified in the configuration as shown below:
143+
```yaml
144+
source:
145+
snapshot.select.statement.overrides: myschema.mytable
146+
snapshot.select.statement.overrides.myschema.mytable: |
147+
SELECT ...
148+
```
95149

96-
```yaml
97-
tables:
98-
schema_name.table_name: # For MySQL: use database_name.table_name
99-
columns:
100-
- column_name1 # Each column on a new line
101-
- column_name2
102-
- column_name3
103-
```
150+
For MySQL, you would have:
151+
152+
```yaml
153+
source:
154+
snapshot.select.statement.overrides: mydatabase.mytable
155+
snapshot.select.statement.overrides.mydatabase.mytable: |
156+
SELECT ...
157+
```
104158

105-
- To capture all columns from a table, use empty curly braces `{}` instead of listing individual columns:
159+
You must also add the list of columns you want to include in the custom `SELECT` statement using fully-qualified names under "sources.tables". Specify each column in the configuration as shown below:
160+
161+
```yaml
162+
tables:
163+
schema_name.table_name: # For MySQL: use database_name.table_name
164+
columns:
165+
- column_name1 # Each column on a new line
166+
- column_name2
167+
- column_name3
168+
```
169+
170+
If you want to capture all columns from a table, you can use empty curly braces `{}` instead of listing all the individual columns:
106171

107172
```yaml
108173
tables:
109174
schema_name.table_name: {} # Captures all columns
110175
```
111176

112-
### Example
113-
114-
To select the columns `CustomerId`, `FirstName` and `LastName` from `customer` table and join it with `invoice` table in order to get customers with total invoices greater than 8000, we need to add the following properties to the `config.yaml` file:
177+
The example configuration below selects the columns `CustomerId`, `FirstName` and `LastName` from the `customer` table and joins it with the `invoice` table to select customers with total invoices greater than 8000:
115178

116179
```yaml
117180
tables:
118181
chinook.customer:
119182
columns:
120-
- CustomerID
121-
- FirstName
122-
- LastName
183+
- CustomerID
184+
- FirstName
185+
- LastName
123186
124187
advanced:
125-
source:
126-
snapshot.select.statement.overrides: chinook.customer
127-
snapshot.select.statement.overrides.chinook.customer: |
128-
SELECT c.CustomerId, c.FirstName, c.LastName
129-
FROM chinook.customer c
130-
INNER JOIN chinook.invoice inv
131-
ON c.CustomerId = inv.CustomerId
132-
WHERE inv.total > 8000
188+
source:
189+
snapshot.select.statement.overrides: chinook.customer
190+
snapshot.select.statement.overrides.chinook.customer: |
191+
SELECT c.CustomerId, c.FirstName, c.LastName
192+
FROM chinook.customer c
193+
INNER JOIN chinook.invoice inv
194+
ON c.CustomerId = inv.CustomerId
195+
WHERE inv.total > 8000
133196
```
134197

135198
### Form custom message key(s) for change event records
@@ -154,52 +217,7 @@ advanced:
154217
- When specifying columns in the `keys` field, ensure that these same columns are also listed under the `columns` field in your configuration.
155218
- There is no limit to the number of columns that can be used to create custom message keys. However, it’s best to use the minimum required number of columns to specify a unique key.
156219

157-
### Fully-qualified table name
158-
159-
In this document we refer to the fully-qualified table name as `<databaseName>.<tableName>`. This format is for MySQL database. For Oracle, SQLServer and Postgresql databases use `<schemaName>`.`<tableName>` instead.
160-
161-
| Database Type | Fully-qualified Table Name |
162-
| -- | -- |
163-
| Oracle, SQLServer, PostrgreSQL | `<schemaName>.<tableName>` |
164-
| MySQL | `<databaseName>.<tableName>` |
165-
166-
{{< note >}}You can specify the fully-qualified table name `<databaseName>.<tableName>` as
167-
a regular expression instead of providing the full name of the `databaseName` and `tableName`.
168-
{{< /note >}}
169-
170-
### Examples
171220

172-
- The primary key of the tables `customer` and `employee` is `ID`.
173-
174-
To establish custom messages keys based on `FirstName` and `LastName` for the tables `customer` and `employee`, add the following block to the `config.yaml` file:
175-
176-
```yaml
177-
tables:
178-
# Sync a specific table with all its columns:
179-
chinook.customer:
180-
columns:
181-
- ID
182-
- FirstName
183-
- LastName
184-
- Company
185-
- Address
186-
- Email
187-
keys:
188-
- FirstName
189-
- LastName
190-
chinook.employee:
191-
columns:
192-
- ID
193-
- FirstName
194-
- LastName
195-
- ReportsTo
196-
- Address
197-
- City
198-
- State
199-
keys:
200-
- FirstName
201-
- LastName
202-
```
203221

204222
## `processors`: RDI processors {#processors}
205223

0 commit comments

Comments
 (0)