You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lambda pre-aggregations follow the [Lambda architecture](https://en.wikipedia.org/wiki/Lambda_architecture) design to union real-time and batch data. Cube acts as a serving layer and uses pre-aggregations as a batch layer and source data or other pre-aggregations, usually [streaming][streaming-pre-agg], as a speed layer.
9
+
10
+
<WarningBox>
11
+
12
+
Lambda pre-aggregations only work with Cube Store.
13
+
14
+
Additionally, we’re going to remove support for external storages, other than Cube Store, later this year. [Cube Store will replace Redis](https://cube.dev/blog/replacing-redis-with-cube-store) and, therefore will be a required component to run Cube even without pre-aggregations.
15
+
16
+
</WarningBox>
17
+
18
+
## Use cases
19
+
20
+
Below we are looking at the most common examples of using lambda pre-aggregations.
21
+
22
+
### Batch and source data
23
+
24
+
Batch data is coming from pre-aggregation and real-time data is coming from the data source.
25
+
26
+
<divstyle="text-align: center">
27
+
<img
28
+
alt="Lambda pre-aggregation batch and source diagram"
First, you need to create pre-aggregations that will contain your batch data. In the following example, we call it **batch.** Please note, it must have `timeDimension`, and Cube will use it to union batch data with source data.
36
+
37
+
You control the batch part of your data with `buildRangeStart` and `buildRangeEnd` properties of pre-aggregation to determine specific window for your batched data.
38
+
39
+
Next, you need to create a lambda pre-aggregation. To do that, create pre-aggregation with type `rollupLambda`, specify rollups you would like to use with `rollups` property, and finally set `unionWithSourceData: true` to use source data as a real-time layer.
40
+
41
+
Please make sure that the lambda pre-aggregation definition comes first when defining your pre-aggregations.
42
+
43
+
44
+
```js
45
+
lambda: {
46
+
type:`rollupLambda`,
47
+
unionWithSourceData:true,
48
+
rollups: [Users.batch]
49
+
},
50
+
batch: {
51
+
measures: [Users.count],
52
+
dimensions: [Users.name],
53
+
timeDimension:Users.createdAt,
54
+
granularity:`day`,
55
+
buildRangeStart: {
56
+
sql:`SELECT '2020-01-01'`
57
+
},
58
+
buildRangeEnd: {
59
+
sql:`SELECT '2022-05-30'`
60
+
}
61
+
}
62
+
```
63
+
64
+
### Batch and streaming data
65
+
66
+
In this scenario, batch data is comes from one pre-aggregation and real-time data comes from a [streaming pre-aggregation][streaming-pre-agg].
67
+
68
+
<divstyle="text-align: center">
69
+
<img
70
+
alt="Lambda pre-aggregation batch and streaming diagram"
You can use lambda pre-aggregations to combine data from multiple pre-aggregation, where one pre-aggregation can have batch data and another streaming.
Copy file name to clipboardExpand all lines: docs/content/Caching/Using-Pre-Aggregations.mdx
+16Lines changed: 16 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -486,6 +486,22 @@ When using cloud storage, it is important to correctly configure any data
486
486
retention policies to clean up the data in the export bucket as Cube.js does not
487
487
currently manage this. For most use-cases, 1 day is sufficient.
488
488
489
+
## Streaming pre-aggregations
490
+
491
+
Streaming pre-aggregations are different from traditional pre-aggregations in the way they are being updated. Traditional pre-aggregations follow the “pull” model — Cube **pulls updates** from the data source based on some cadence and/or condition. Streaming pre-aggregations follow the “push” model — Cube **subscribes to the updates** from the data source and always keeps pre-aggregation up to date.
492
+
493
+
You don’t need to define `refreshKey` for streaming pre-aggregations. Whether pre-aggregation is streaming or not is defined by the data source.
494
+
495
+
Currently, Cube supports only one streaming data source - [ksqlDB](/config/databases/ksqldb). All pre-aggregations where data source is ksqlDB are streaming.
496
+
497
+
We’re working on supporting streaming pre-aggregations for the following data sources -
498
+
499
+
- Materialize
500
+
- Flink SQL
501
+
- Spark Streaming
502
+
503
+
Please [let us know](https://cube.dev/contact) if you are interested in early access to any of these drivers or would like Cube to support any other SQL streaming engine.
0 commit comments