Skip to content

Commit b465a22

Browse files
authored
feat(firestore-bigquery-export): materialized views (#2262)
* feat(firestore-bigquery-export): support materialized views * chore(firestore-bigquery-export): bump version and changelog * chore(firestore-bigquery-export): format and fix snapshot test
1 parent 99fed38 commit b465a22

File tree

12 files changed

+3781
-14758
lines changed

12 files changed

+3781
-14758
lines changed

firestore-bigquery-export/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## Version 0.1.57
2+
3+
feat - add basic materialized views support
4+
15
## Version 0.1.56
26

37
feat - improve sync strategy by immediately writing to BQ, and using cloud tasks only as a last resort

firestore-bigquery-export/PREINSTALL.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,81 @@ Prior to sending the document change to BigQuery, you have an opportunity to tra
6969

7070
The response should be indentical in structure.
7171

72+
#### Materialized Views
73+
74+
This extension supports both regular views and materialized views in BigQuery. While regular views compute their results each time they're queried, materialized views store their query results, providing faster access at the cost of additional storage.
75+
76+
There are two types of materialized views available:
77+
78+
1. **Non-incremental Materialized Views**: These views support more complex queries including filtering on aggregated fields, but require complete recomputation during refresh.
79+
80+
2. **Incremental Materialized Views**: These views update more efficiently by processing only new or changed records, but come with query restrictions. Most notably, they don't allow filtering or partitioning on aggregated fields in their defining SQL, among other limitations.
81+
82+
**Important Considerations:**
83+
- Neither type of materialized view in this extension currently supports partitioning or clustering
84+
- Both types allow you to configure refresh intervals and maximum staleness settings during extension installation or configuration
85+
- Once created, a materialized view's SQL definition cannot be modified. If you reconfigure the extension to change either the view type (incremental vs non-incremental) or the SQL query, the extension will drop the existing materialized view and recreate it
86+
- Carefully consider your use case before choosing materialized views:
87+
- They incur additional storage costs as they cache query results
88+
- Non-incremental views may have higher processing costs during refresh
89+
- Incremental views have more query restrictions but are more efficient to update
90+
91+
Example of a non-incremental materialized view SQL definition generated by the extension:
92+
```sql
93+
CREATE MATERIALIZED VIEW `my_project.my_dataset.my_table_raw_changelog`
94+
OPTIONS (
95+
allow_non_incremental_definition = true,
96+
enable_refresh = true,
97+
refresh_interval_minutes = 60,
98+
max_staleness = INTERVAL "4:0:0" HOUR TO SECOND
99+
)
100+
AS (
101+
WITH latests AS (
102+
SELECT
103+
document_name,
104+
MAX_BY(document_id, timestamp) AS document_id,
105+
MAX(timestamp) AS timestamp,
106+
MAX_BY(event_id, timestamp) AS event_id,
107+
MAX_BY(operation, timestamp) AS operation,
108+
MAX_BY(data, timestamp) AS data,
109+
MAX_BY(old_data, timestamp) AS old_data,
110+
MAX_BY(extra_field, timestamp) AS extra_field
111+
FROM `my_project.my_dataset.my_table_raw_changelog`
112+
GROUP BY document_name
113+
)
114+
SELECT *
115+
FROM latests
116+
WHERE operation != "DELETE"
117+
)
118+
```
119+
120+
Example of an incremental materialized view SQL definition generated by the extension:
121+
```sql
122+
CREATE MATERIALIZED VIEW `my_project.my_dataset.my_table_raw_changelog`
123+
OPTIONS (
124+
enable_refresh = true,
125+
refresh_interval_minutes = 60,
126+
max_staleness = INTERVAL "4:0:0" HOUR TO SECOND
127+
)
128+
AS (
129+
SELECT
130+
document_name,
131+
MAX_BY(document_id, timestamp) AS document_id,
132+
MAX(timestamp) AS timestamp,
133+
MAX_BY(event_id, timestamp) AS event_id,
134+
MAX_BY(operation, timestamp) AS operation,
135+
MAX_BY(data, timestamp) AS data,
136+
MAX_BY(old_data, timestamp) AS old_data,
137+
MAX_BY(extra_field, timestamp) AS extra_field
138+
FROM
139+
`my_project.my_dataset.my_table_raw_changelog`
140+
GROUP BY
141+
document_name
142+
)
143+
```
144+
145+
Please review [BigQuery's documentation on materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro) to fully understand the implications for your use case.
146+
72147
#### Using Customer Managed Encryption Keys
73148

74149
By default, BigQuery encrypts your content stored at rest. BigQuery handles and manages this default encryption for you without any additional actions on your part.

firestore-bigquery-export/README.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,81 @@ Prior to sending the document change to BigQuery, you have an opportunity to tra
7777

7878
The response should be indentical in structure.
7979

80+
#### Materialized Views
81+
82+
This extension supports both regular views and materialized views in BigQuery. While regular views compute their results each time they're queried, materialized views store their query results, providing faster access at the cost of additional storage.
83+
84+
There are two types of materialized views available:
85+
86+
1. **Non-incremental Materialized Views**: These views support more complex queries including filtering on aggregated fields, but require complete recomputation during refresh.
87+
88+
2. **Incremental Materialized Views**: These views update more efficiently by processing only new or changed records, but come with query restrictions. Most notably, they don't allow filtering or partitioning on aggregated fields in their defining SQL, among other limitations.
89+
90+
**Important Considerations:**
91+
- Neither type of materialized view in this extension currently supports partitioning or clustering
92+
- Both types allow you to configure refresh intervals and maximum staleness settings during extension installation or configuration
93+
- Once created, a materialized view's SQL definition cannot be modified. If you reconfigure the extension to change either the view type (incremental vs non-incremental) or the SQL query, the extension will drop the existing materialized view and recreate it
94+
- Carefully consider your use case before choosing materialized views:
95+
- They incur additional storage costs as they cache query results
96+
- Non-incremental views may have higher processing costs during refresh
97+
- Incremental views have more query restrictions but are more efficient to update
98+
99+
Example of a non-incremental materialized view SQL definition generated by the extension:
100+
```sql
101+
CREATE MATERIALIZED VIEW `my_project.my_dataset.my_table_raw_changelog`
102+
OPTIONS (
103+
allow_non_incremental_definition = true,
104+
enable_refresh = true,
105+
refresh_interval_minutes = 60,
106+
max_staleness = INTERVAL "4:0:0" HOUR TO SECOND
107+
)
108+
AS (
109+
WITH latests AS (
110+
SELECT
111+
document_name,
112+
MAX_BY(document_id, timestamp) AS document_id,
113+
MAX(timestamp) AS timestamp,
114+
MAX_BY(event_id, timestamp) AS event_id,
115+
MAX_BY(operation, timestamp) AS operation,
116+
MAX_BY(data, timestamp) AS data,
117+
MAX_BY(old_data, timestamp) AS old_data,
118+
MAX_BY(extra_field, timestamp) AS extra_field
119+
FROM `my_project.my_dataset.my_table_raw_changelog`
120+
GROUP BY document_name
121+
)
122+
SELECT *
123+
FROM latests
124+
WHERE operation != "DELETE"
125+
)
126+
```
127+
128+
Example of an incremental materialized view SQL definition generated by the extension:
129+
```sql
130+
CREATE MATERIALIZED VIEW `my_project.my_dataset.my_table_raw_changelog`
131+
OPTIONS (
132+
enable_refresh = true,
133+
refresh_interval_minutes = 60,
134+
max_staleness = INTERVAL "4:0:0" HOUR TO SECOND
135+
)
136+
AS (
137+
SELECT
138+
document_name,
139+
MAX_BY(document_id, timestamp) AS document_id,
140+
MAX(timestamp) AS timestamp,
141+
MAX_BY(event_id, timestamp) AS event_id,
142+
MAX_BY(operation, timestamp) AS operation,
143+
MAX_BY(data, timestamp) AS data,
144+
MAX_BY(old_data, timestamp) AS old_data,
145+
MAX_BY(extra_field, timestamp) AS extra_field
146+
FROM
147+
`my_project.my_dataset.my_table_raw_changelog`
148+
GROUP BY
149+
document_name
150+
)
151+
```
152+
153+
Please review [BigQuery's documentation on materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro) to fully understand the implications for your use case.
154+
80155
#### Using Customer Managed Encryption Keys
81156

82157
By default, BigQuery encrypts your content stored at rest. BigQuery handles and manages this default encryption for you without any additional actions on your part.
@@ -187,6 +262,12 @@ essential for the script to insert data into an already partitioned table.)
187262

188263
* Maximum number of synced documents per second: This parameter will set the maximum number of syncronised documents per second with BQ. Please note, any other external updates to a Big Query table will be included within this quota. Ensure that you have a set a low enough number to compensate. Defaults to 10.
189264

265+
* View Type: Select the type of view to create in BigQuery. A regular view is a virtual table defined by a SQL query. A materialized view persists the results of a query for faster access, with either incremental or non-incremental updates. Please note that materialized views in this extension come with several important caveats and limitations - carefully review the pre-install documentation before selecting these options to ensure they are appropriate for your use case.
266+
267+
* Maximum Staleness Duration: For materialized views only: Specifies the maximum staleness acceptable for the materialized view. Should be specified as an INTERVAL value following BigQuery SQL syntax. This parameter will only take effect if View Type is set to a materialized view option.
268+
269+
* Refresh Interval (Minutes): For materialized views only: Specifies how often the materialized view should be refreshed, in minutes. This parameter will only take effect if View Type is set to a materialized view option.
270+
190271
* Backup Collection Name: This (optional) parameter will allow you to specify a collection for which failed BigQuery updates will be written to.
191272

192273
* Transform function URL: Specify a function URL to call that will transform the payload that will be written to BigQuery. See the pre-install documentation for more details.

firestore-bigquery-export/extension.yaml

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# limitations under the License.
1414

1515
name: firestore-bigquery-export
16-
version: 0.1.56
16+
version: 0.1.57
1717
specVersion: v1beta
1818

1919
displayName: Stream Firestore to BigQuery
@@ -336,6 +336,50 @@ params:
336336
default: 100
337337
required: false
338338

339+
- param: VIEW_TYPE
340+
label: View Type
341+
description: >-
342+
Select the type of view to create in BigQuery. A regular view is a virtual
343+
table defined by a SQL query. A materialized view persists the results of
344+
a query for faster access, with either incremental or non-incremental
345+
updates. Please note that materialized views in this extension come with
346+
several important caveats and limitations - carefully review the
347+
pre-install documentation before selecting these options to ensure they
348+
are appropriate for your use case.
349+
type: select
350+
options:
351+
- label: View
352+
value: view
353+
- label: Materialized View (Incremental)
354+
value: materialized_incremental
355+
- label: Materialized View (Non-incremental)
356+
value: materialized_non_incremental
357+
default: view
358+
required: true
359+
360+
- param: MAX_STALENESS
361+
label: Maximum Staleness Duration
362+
description: >-
363+
For materialized views only: Specifies the maximum staleness acceptable
364+
for the materialized view. Should be specified as an INTERVAL value
365+
following BigQuery SQL syntax. This parameter will only take effect if
366+
View Type is set to a materialized view option.
367+
type: string
368+
example: INTERVAL "8:0:0" HOUR TO SECOND
369+
required: false
370+
371+
- param: REFRESH_INTERVAL_MINUTES
372+
label: Refresh Interval (Minutes)
373+
description: >-
374+
For materialized views only: Specifies how often the materialized view
375+
should be refreshed, in minutes. This parameter will only take effect if
376+
View Type is set to a materialized view option.
377+
type: number
378+
example: 60
379+
validationRegex: ^[1-9][0-9]*$
380+
validationErrorMessage: Must be a positive integer
381+
required: false
382+
339383
- param: BACKUP_COLLECTION
340384
label: Backup Collection Name
341385
description: >-

firestore-bigquery-export/firestore-bigquery-change-tracker/package-lock.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

firestore-bigquery-export/functions/__tests__/__snapshots__/config.test.ts.snap

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,16 @@ Object {
2323
"location": "us-central1",
2424
"maxDispatchesPerSecond": 10,
2525
"maxEnqueueAttempts": 3,
26+
"maxStaleness": undefined,
27+
"refreshIntervalMinutes": undefined,
2628
"tableId": "my_table",
2729
"timePartitioning": null,
2830
"timePartitioningField": undefined,
2931
"timePartitioningFieldType": undefined,
3032
"timePartitioningFirestoreField": undefined,
3133
"transformFunction": "",
3234
"useNewSnapshotQuerySyntax": false,
35+
"viewType": "view",
3336
"wildcardIds": false,
3437
}
3538
`;

0 commit comments

Comments
 (0)