Skip to content

Commit 924b3ac

Browse files
keydunovrpaikhassankhan
authored
docs: add lambda pre-aggs and ksqldb (cube-js#5321)
* docs: add lambda pre-aggs and ksqldb * Update docs/content/Caching/Lambda-Pre-Aggregations.mdx Co-authored-by: Ray Paik <ray@cube.dev> * Update docs/content/Caching/Lambda-Pre-Aggregations.mdx Co-authored-by: Ray Paik <ray@cube.dev> * Update docs/content/Caching/Using-Pre-Aggregations.mdx Co-authored-by: Ray Paik <ray@cube.dev> * Update docs/content/Caching/Lambda-Pre-Aggregations.mdx Co-authored-by: Ray Paik <ray@cube.dev> * Update docs/content/Caching/Lambda-Pre-Aggregations.mdx Co-authored-by: Hassan Khan <hassan@cube.dev> * Update docs/content/Caching/Lambda-Pre-Aggregations.mdx Co-authored-by: Hassan Khan <hassan@cube.dev> Co-authored-by: Ray Paik <ray@cube.dev> Co-authored-by: Hassan Khan <hassan@cube.dev>
1 parent eabbdc2 commit 924b3ac

File tree

7 files changed

+208
-26
lines changed

7 files changed

+208
-26
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
title: Lambda Pre-Aggregations
3+
permalink: /caching/pre-aggregations/lambda-pre-aggregations
4+
category: Caching
5+
menuOrder: 4
6+
---
7+
8+
Lambda pre-aggregations follow the [Lambda architecture](https://en.wikipedia.org/wiki/Lambda_architecture) design to union real-time and batch data. Cube acts as a serving layer and uses pre-aggregations as a batch layer and source data or other pre-aggregations, usually [streaming][streaming-pre-agg], as a speed layer.
9+
10+
<WarningBox>
11+
12+
Lambda pre-aggregations only work with Cube Store.
13+
14+
Additionally, we’re going to remove support for external storages, other than Cube Store, later this year. [Cube Store will replace Redis](https://cube.dev/blog/replacing-redis-with-cube-store) and, therefore will be a required component to run Cube even without pre-aggregations.
15+
16+
</WarningBox>
17+
18+
## Use cases
19+
20+
Below we are looking at the most common examples of using lambda pre-aggregations.
21+
22+
### Batch and source data
23+
24+
Batch data is coming from pre-aggregation and real-time data is coming from the data source.
25+
26+
<div style="text-align: center">
27+
<img
28+
alt="Lambda pre-aggregation batch and source diagram"
29+
src="https://raw.githubusercontent.com/cube-js/cube.js/master/docs/content/Caching/lambda-batch-source.png"
30+
style="border: none"
31+
width="100%"
32+
/>
33+
</div>
34+
35+
First, you need to create pre-aggregations that will contain your batch data. In the following example, we call it **batch.** Please note, it must have `timeDimension`, and Cube will use it to union batch data with source data.
36+
37+
You control the batch part of your data with `buildRangeStart` and `buildRangeEnd` properties of pre-aggregation to determine specific window for your batched data.
38+
39+
Next, you need to create a lambda pre-aggregation. To do that, create pre-aggregation with type `rollupLambda`, specify rollups you would like to use with `rollups` property, and finally set `unionWithSourceData: true` to use source data as a real-time layer.
40+
41+
Please make sure that the lambda pre-aggregation definition comes first when defining your pre-aggregations.
42+
43+
44+
```js
45+
lambda: {
46+
type: `rollupLambda`,
47+
unionWithSourceData: true,
48+
rollups: [Users.batch]
49+
},
50+
batch: {
51+
measures: [Users.count],
52+
dimensions: [Users.name],
53+
timeDimension: Users.createdAt,
54+
granularity: `day`,
55+
buildRangeStart: {
56+
sql: `SELECT '2020-01-01'`
57+
},
58+
buildRangeEnd: {
59+
sql: `SELECT '2022-05-30'`
60+
}
61+
}
62+
```
63+
64+
### Batch and streaming data
65+
66+
In this scenario, batch data is comes from one pre-aggregation and real-time data comes from a [streaming pre-aggregation][streaming-pre-agg].
67+
68+
<div style="text-align: center">
69+
<img
70+
alt="Lambda pre-aggregation batch and streaming diagram"
71+
src="https://raw.githubusercontent.com/cube-js/cube.js/master/docs/content/Caching/lambda-batch-streaming.png"
72+
style="border: none"
73+
width="100%"
74+
/>
75+
</div>
76+
77+
78+
You can use lambda pre-aggregations to combine data from multiple pre-aggregation, where one pre-aggregation can have batch data and another streaming.
79+
80+
```js
81+
batchStreamingLambda: {
82+
type: `rollupLambda`,
83+
rollups: [Users.batch, streaming]
84+
},
85+
batch: {
86+
type: `rollup`,
87+
measures: [Users.count],
88+
dimensions: [Users.name],
89+
timeDimension: Users.createdAt,
90+
granularity: `day`,
91+
buildRangeStart: {
92+
sql: `SELECT '2020-01-01'`
93+
},
94+
buildRangeEnd: {
95+
sql: `SELECT '2022-05-30'`
96+
}
97+
},
98+
streaming: {
99+
type: `rollup`,
100+
measures: [StreamingUsers.count],
101+
dimensions: [StreamingUsers.name],
102+
timeDimension: StreamingUsers.createdAt,
103+
granularity: `day`
104+
}
105+
```
106+
107+
[streaming-pre-agg]: /caching/using-pre-aggregations#streaming-pre-aggregations

docs/content/Caching/Using-Pre-Aggregations.mdx

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -486,6 +486,22 @@ When using cloud storage, it is important to correctly configure any data
486486
retention policies to clean up the data in the export bucket as Cube.js does not
487487
currently manage this. For most use-cases, 1 day is sufficient.
488488

489+
## Streaming pre-aggregations
490+
491+
Streaming pre-aggregations are different from traditional pre-aggregations in the way they are being updated. Traditional pre-aggregations follow the “pull” model — Cube **pulls updates** from the data source based on some cadence and/or condition. Streaming pre-aggregations follow the “push” model — Cube **subscribes to the updates** from the data source and always keeps pre-aggregation up to date.
492+
493+
You don’t need to define `refreshKey` for streaming pre-aggregations. Whether pre-aggregation is streaming or not is defined by the data source.
494+
495+
Currently, Cube supports only one streaming data source - [ksqlDB](/config/databases/ksqldb). All pre-aggregations where data source is ksqlDB are streaming.
496+
497+
We’re working on supporting streaming pre-aggregations for the following data sources -
498+
499+
- Materialize
500+
- Flink SQL
501+
- Spark Streaming
502+
503+
Please [let us know](https://cube.dev/contact) if you are interested in early access to any of these drivers or would like Cube to support any other SQL streaming engine.
504+
489505
[ref-caching-in-mem-default-refresh-key]: /caching#default-refresh-keys
490506
[ref-config-db]: /config/databases
491507
[ref-config-driverfactory]: /config#driver-factory
447 KB
Loading
Loading

docs/content/Configuration/Connecting-to-the-Database.mdx

Lines changed: 30 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,6 @@ Choose a data store to get started with below.
2323
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/redshift.svg"
2424
title="Amazon Redshift"
2525
/>
26-
<GridItem
27-
url="databases/clickhouse"
28-
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/clickhouse.svg"
29-
title="ClickHouse"
30-
/>
31-
<GridItem
32-
url="databases/firebolt"
33-
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/firebolt.svg"
34-
title="Firebolt"
35-
/>
3626
<GridItem
3727
url="databases/google-bigquery"
3828
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/bigquery.svg"
@@ -43,6 +33,21 @@ Choose a data store to get started with below.
4333
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/snowflake.svg"
4434
title="Snowflake"
4535
/>
36+
<GridItem
37+
url="databases/databricks/jdbc"
38+
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/databricks.svg"
39+
title="Databricks"
40+
/>
41+
<GridItem
42+
url="databases/clickhouse"
43+
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/clickhouse.svg"
44+
title="ClickHouse"
45+
/>
46+
<GridItem
47+
url="databases/firebolt"
48+
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/firebolt.svg"
49+
title="Firebolt"
50+
/>
4651
</Grid>
4752

4853
## Query Engines
@@ -53,11 +58,6 @@ Choose a data store to get started with below.
5358
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/athena.svg"
5459
title="Amazon Athena"
5560
/>
56-
<GridItem
57-
url="databases/databricks/jdbc"
58-
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/databricks.svg"
59-
title="Databricks"
60-
/>
6161
<GridItem
6262
url="databases/hive-sparksql"
6363
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/hive.svg"
@@ -73,6 +73,11 @@ Choose a data store to get started with below.
7373
## Transactional Databases
7474

7575
<Grid imageSize={[56, 56]}>
76+
<GridItem
77+
url="databases/postgres"
78+
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/postgres.svg"
79+
title="Postgres"
80+
/>
7681
<GridItem
7782
url="databases/mssql"
7883
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/mssql.svg"
@@ -88,11 +93,6 @@ Choose a data store to get started with below.
8893
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/oracle.svg"
8994
title="Oracle"
9095
/>
91-
<GridItem
92-
url="databases/postgres"
93-
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/postgres.svg"
94-
title="Postgres"
95-
/>
9696
<GridItem
9797
url="databases/sqlite"
9898
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/docs/content/Configuration/Databases/sqlite.svg"
@@ -110,13 +110,13 @@ Choose a data store to get started with below.
110110
/>
111111
</Grid>
112112

113-
## Streaming & Real-Time Databases
113+
## Streaming
114114

115115
<Grid imageSize={[56, 56]}>
116116
<GridItem
117-
url="databases/druid"
118-
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/druid.svg"
119-
title="Druid"
117+
url="databases/ksqldb"
118+
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/ksqldb.svg"
119+
title="ksqlDB"
120120
/>
121121
<GridItem
122122
url="databases/materialize"
@@ -125,8 +125,7 @@ Choose a data store to get started with below.
125125
/>
126126
</Grid>
127127

128-
## NoSQL & Document Databases
129-
128+
## NoSQL & Other Data Sources
130129
<Grid imageSize={[56, 56]}>
131130
<GridItem
132131
url="databases/elasticsearch"
@@ -138,6 +137,11 @@ Choose a data store to get started with below.
138137
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/mongodb.svg"
139138
title="MongoDB"
140139
/>
140+
<GridItem
141+
url="databases/druid"
142+
imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/druid.svg"
143+
title="Druid"
144+
/>
141145
</Grid>
142146

143147
## Multiple Databases
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: ksqlDB
3+
permalink: /config/databases/ksqldb
4+
---
5+
6+
<WarningBox>
7+
ksqlDB driver is in preview. Please <a href="https://cube.dev/contact">contact us</a> if you need help running it in production.
8+
</WarningBox>
9+
10+
## Prerequisites
11+
12+
- Hostname for the ksqlDB server
13+
- Username and password to connect to ksqlDB server
14+
15+
If you are using Confluent Cloud, you need to generate API key and use **key as username** and **secret as password**.
16+
17+
## Setup
18+
19+
### <--{"id" : "Setup"}--> Manual
20+
21+
Add the following to a `.env` file in your Cube.js project:
22+
23+
```bash
24+
CUBEJS_DB_TYPE=ksql
25+
CUBEJS_DB_URL=https://xxxxxx-xxxxx.us-west4.gcp.confluent.cloud:443
26+
CUBEJS_DB_USER=username
27+
CUBEJS_DB_PASS=password
28+
```
29+
30+
## Environment Variables
31+
32+
| Environment Variable | Description | Possible Values | Required |
33+
| -------------------- | ----------------------------------------------------------------------------------- | ------------------------- | :------: |
34+
| `CUBEJS_DB_URL` | The host URL for ksqlDB with port | A valid database host URL ||
35+
| `CUBEJS_DB_USER` | The username used to connect to the ksqlDB. API key for Confluent Cloud. | A valid port number ||
36+
| `CUBEJS_DB_PASS` | The password used to connect to the ksqlDB. API secret for Confluent Cloud. | A valid database name ||
37+
38+
## Pre-Aggregations Support
39+
40+
ksqlDB supports only [streaming pre-aggregations](/caching/using-pre-aggregations#streaming-pre-aggregations).
Lines changed: 15 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)