docs: add lambda pre-aggs and ksqldb (cube-js#5321)

keydunov · rpaik · hassankhan · web-flow · commit 924b3ac7e159 · 2022-09-20T16:03:32.000-07:00
* docs: add lambda pre-aggs and ksqldb

* Update docs/content/Caching/Lambda-Pre-Aggregations.mdx

Co-authored-by: Ray Paik &lt;ray@cube.dev&gt;

* Update docs/content/Caching/Lambda-Pre-Aggregations.mdx

Co-authored-by: Ray Paik &lt;ray@cube.dev&gt;

* Update docs/content/Caching/Using-Pre-Aggregations.mdx

Co-authored-by: Ray Paik &lt;ray@cube.dev&gt;

* Update docs/content/Caching/Lambda-Pre-Aggregations.mdx

Co-authored-by: Ray Paik &lt;ray@cube.dev&gt;

* Update docs/content/Caching/Lambda-Pre-Aggregations.mdx

Co-authored-by: Hassan Khan &lt;hassan@cube.dev&gt;

* Update docs/content/Caching/Lambda-Pre-Aggregations.mdx

Co-authored-by: Hassan Khan &lt;hassan@cube.dev&gt;

Co-authored-by: Ray Paik &lt;ray@cube.dev&gt;
Co-authored-by: Hassan Khan &lt;hassan@cube.dev&gt;
diff --git a/docs/content/Caching/Lambda-Pre-Aggregations.mdx b/docs/content/Caching/Lambda-Pre-Aggregations.mdx
@@ -0,0 +1,107 @@
+---
+title: Lambda Pre-Aggregations
+permalink: /caching/pre-aggregations/lambda-pre-aggregations
+category: Caching
+menuOrder: 4
+---
+
+Lambda pre-aggregations follow the [Lambda architecture](https://en.wikipedia.org/wiki/Lambda_architecture) design to union real-time and batch data. Cube acts as a serving layer and uses pre-aggregations as a batch layer and source data or other pre-aggregations, usually [streaming][streaming-pre-agg], as a speed layer.
+
+<WarningBox>
+
+Lambda pre-aggregations only work with Cube Store. 
+
+Additionally, we’re going to remove support for external storages, other than Cube Store, later this year. [Cube Store will replace Redis](https://cube.dev/blog/replacing-redis-with-cube-store) and, therefore will be a required component to run Cube even without pre-aggregations.
+
+</WarningBox>
+
+## Use cases
+
+Below we are looking at the most common examples of using lambda pre-aggregations.
+
+### Batch and source data
+
+Batch data is coming from pre-aggregation and real-time data is coming from the data source.
+
+<div style="text-align: center">
+  <img
+    alt="Lambda pre-aggregation batch and source diagram"
+    src="https://raw.githubusercontent.com/cube-js/cube.js/master/docs/content/Caching/lambda-batch-source.png"
+    style="border: none"
+    width="100%"
+  />
+</div>
+
+First, you need to create pre-aggregations that will contain your batch data. In the following example, we call it **batch.** Please note, it must have `timeDimension`, and Cube will use it to union batch data with source data.
+
+You control the batch part of your data with `buildRangeStart` and `buildRangeEnd` properties of pre-aggregation to determine specific window for your batched data.
+
+Next, you need to create a lambda pre-aggregation. To do that, create pre-aggregation with type `rollupLambda`, specify rollups you would like to use with `rollups` property, and finally set `unionWithSourceData: true` to use source data as a real-time layer.
+
+Please make sure that the lambda pre-aggregation definition comes first when defining your pre-aggregations.
+
+
+```js
+lambda: {
+  type: `rollupLambda`,
+  unionWithSourceData: true,
+  rollups: [Users.batch]
+},
+batch: {
+  measures: [Users.count],
+  dimensions: [Users.name],
+  timeDimension: Users.createdAt,
+  granularity: `day`,
+  buildRangeStart: {
+    sql: `SELECT '2020-01-01'`
+  },
+  buildRangeEnd: {
+    sql: `SELECT '2022-05-30'`
+  }
+}
+```
+
+### Batch and streaming data
+
+In this scenario, batch data is comes from one pre-aggregation and real-time data comes from a [streaming pre-aggregation][streaming-pre-agg].
+
+<div style="text-align: center">
+  <img
+    alt="Lambda pre-aggregation batch and streaming diagram"
+    src="https://raw.githubusercontent.com/cube-js/cube.js/master/docs/content/Caching/lambda-batch-streaming.png"
+    style="border: none"
+    width="100%"
+  />
+</div>
+
+
+You can use lambda pre-aggregations to combine data from multiple pre-aggregation, where one pre-aggregation can have batch data and another streaming.
+
+```js
+batchStreamingLambda: {
+  type: `rollupLambda`,
+  rollups: [Users.batch, streaming]
+},
+batch: {
+  type: `rollup`,
+  measures: [Users.count],
+  dimensions: [Users.name],
+  timeDimension: Users.createdAt,
+  granularity: `day`,
+  buildRangeStart: {
+    sql: `SELECT '2020-01-01'`
+  },
+  buildRangeEnd: {
+    sql: `SELECT '2022-05-30'`
+  }
+},
+streaming: {
+  type: `rollup`,
+  measures: [StreamingUsers.count],
+  dimensions: [StreamingUsers.name],
+  timeDimension: StreamingUsers.createdAt,
+  granularity: `day`
+}
+```
+
+[streaming-pre-agg]: /caching/using-pre-aggregations#streaming-pre-aggregations
diff --git a/docs/content/Caching/Using-Pre-Aggregations.mdx b/docs/content/Caching/Using-Pre-Aggregations.mdx
@@ -486,6 +486,22 @@ When using cloud storage, it is important to correctly configure any data
 retention policies to clean up the data in the export bucket as Cube.js does not
 currently manage this. For most use-cases, 1 day is sufficient.
 
+## Streaming pre-aggregations
+
+Streaming pre-aggregations are different from traditional pre-aggregations in the way they are being updated. Traditional pre-aggregations follow the “pull” model — Cube **pulls updates** from the data source based on some cadence and/or condition. Streaming pre-aggregations follow the “push” model — Cube **subscribes to the updates** from the data source and always keeps pre-aggregation up to date. 
+
+You don’t need to define `refreshKey` for streaming pre-aggregations. Whether pre-aggregation is streaming or not is defined by the data source. 
+
+Currently, Cube supports only one streaming data source - [ksqlDB](/config/databases/ksqldb). All pre-aggregations where data source is ksqlDB are streaming. 
+
+We’re working on supporting streaming pre-aggregations for the following data sources -
+
+- Materialize
+- Flink SQL
+- Spark Streaming
+
+Please [let us know](https://cube.dev/contact) if you are interested in early access to any of these drivers or would like Cube to support any other SQL streaming engine.
+
 [ref-caching-in-mem-default-refresh-key]: /caching#default-refresh-keys
 [ref-config-db]: /config/databases
 [ref-config-driverfactory]: /config#driver-factory
diff --git a/docs/content/Caching/lambda-batch-source.png b/docs/content/Caching/lambda-batch-source.png
diff --git a/docs/content/Caching/lambda-batch-streaming.png b/docs/content/Caching/lambda-batch-streaming.png
diff --git a/docs/content/Configuration/Connecting-to-the-Database.mdx b/docs/content/Configuration/Connecting-to-the-Database.mdx
@@ -23,16 +23,6 @@ Choose a data store to get started with below.
     imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/redshift.svg"
     title="Amazon Redshift"
   />
-  <GridItem
-    url="databases/clickhouse"
-    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/clickhouse.svg"
-    title="ClickHouse"
-  />
-  <GridItem
-    url="databases/firebolt"
-    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/firebolt.svg"
-    title="Firebolt"
-  />
   <GridItem
     url="databases/google-bigquery"
     imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/bigquery.svg"
@@ -43,6 +33,21 @@ Choose a data store to get started with below.
     imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/snowflake.svg"
     title="Snowflake"
   />
+  <GridItem
+    url="databases/databricks/jdbc"
+    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/databricks.svg"
+    title="Databricks"
+  />
+  <GridItem
+    url="databases/clickhouse"
+    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/clickhouse.svg"
+    title="ClickHouse"
+  />
+  <GridItem
+    url="databases/firebolt"
+    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/firebolt.svg"
+    title="Firebolt"
+  />
 </Grid>
 
 ## Query Engines
@@ -53,11 +58,6 @@ Choose a data store to get started with below.
     imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/athena.svg"
     title="Amazon Athena"
   />
-  <GridItem
-    url="databases/databricks/jdbc"
-    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/databricks.svg"
-    title="Databricks"
-  />
   <GridItem
     url="databases/hive-sparksql"
     imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/hive.svg"
@@ -73,6 +73,11 @@ Choose a data store to get started with below.
 ## Transactional Databases
 
 <Grid imageSize={[56, 56]}>
+  <GridItem
+    url="databases/postgres"
+    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/postgres.svg"
+    title="Postgres"
+  />
   <GridItem
     url="databases/mssql"
     imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/mssql.svg"
@@ -88,11 +93,6 @@ Choose a data store to get started with below.
     imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/oracle.svg"
     title="Oracle"
   />    
-  <GridItem
-    url="databases/postgres"
-    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/postgres.svg"
-    title="Postgres"
-  />
   <GridItem
     url="databases/sqlite"
     imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/docs/content/Configuration/Databases/sqlite.svg"
@@ -110,13 +110,13 @@ Choose a data store to get started with below.
   />
 </Grid>
 
-## Streaming & Real-Time Databases
+## Streaming
 
 <Grid imageSize={[56, 56]}>
   <GridItem
-    url="databases/druid"
-    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/druid.svg"
-    title="Druid"
+    url="databases/ksqldb"
+    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/ksqldb.svg"
+    title="ksqlDB"
   />
   <GridItem
     url="databases/materialize"
@@ -125,8 +125,7 @@ Choose a data store to get started with below.
   />
 </Grid>
 
-## NoSQL & Document Databases
-
+## NoSQL & Other Data Sources
 <Grid imageSize={[56, 56]}>
   <GridItem
     url="databases/elasticsearch"
@@ -138,6 +137,11 @@ Choose a data store to get started with below.
     imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/mongodb.svg"
     title="MongoDB"
   />
+  <GridItem
+    url="databases/druid"
+    imageUrl="https://raw.githubusercontent.com/cube-js/cube.js/master/packages/cubejs-playground/src/img/db/druid.svg"
+    title="Druid"
+  />
 </Grid>
 
 ## Multiple Databases
diff --git a/docs/content/Configuration/Databases/ksqlDB.mdx b/docs/content/Configuration/Databases/ksqlDB.mdx
@@ -0,0 +1,40 @@
+---
+title: ksqlDB
+permalink: /config/databases/ksqldb
+---
+
+<WarningBox>
+ksqlDB driver is in preview. Please <a href="https://cube.dev/contact">contact us</a> if you need help running it in production.
+</WarningBox>
+
+## Prerequisites
+
+- Hostname for the ksqlDB server
+- Username and password to connect to ksqlDB server
+
+If you are using Confluent Cloud, you need to generate API key and use **key as username** and **secret as password**.
+
+## Setup
+
+### <--{"id" : "Setup"}--> Manual
+
+Add the following to a `.env` file in your Cube.js project:
+
+```bash
+CUBEJS_DB_TYPE=ksql
+CUBEJS_DB_URL=https://xxxxxx-xxxxx.us-west4.gcp.confluent.cloud:443
+CUBEJS_DB_USER=username
+CUBEJS_DB_PASS=password
+```
+
+## Environment Variables
+
+| Environment Variable | Description                                                                         | Possible Values           | Required |
+| -------------------- | ----------------------------------------------------------------------------------- | ------------------------- | :------: |
+| `CUBEJS_DB_URL`      | The host URL for ksqlDB with port                                                   | A valid database host URL |    ✅    |
+| `CUBEJS_DB_USER`     | The username used to connect to the ksqlDB. API key for Confluent Cloud.            | A valid port number       |    ✅    |
+| `CUBEJS_DB_PASS`     | The password used to connect to the ksqlDB. API secret for Confluent Cloud.         | A valid database name     |    ✅    |
+
+## Pre-Aggregations Support
+
+ksqlDB supports only [streaming pre-aggregations](/caching/using-pre-aggregations#streaming-pre-aggregations).
diff --git a/packages/cubejs-playground/src/img/db/ksqldb.svg b/packages/cubejs-playground/src/img/db/ksqldb.svg
@@ -0,0 +1,15 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<svg width="35px" height="63px" viewBox="0 0 35 63" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+    <!-- Generator: Sketch 57.1 (83088) - https://sketch.com -->
+    <title>img-rocket</title>
+    <desc>Created with Sketch.</desc>
+    <g id="KSQL" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
+        <g id="KSQL-|-Home" transform="translate(-667.000000, -1479.000000)" fill="#EF5862">
+            <g id="illustration" transform="translate(190.000000, 1479.000000)">
+                <g id="Group-71" transform="translate(217.000000, 0.000000)">
+                    <path d="M284.861628,58.1416702 L284.923491,58.269675 L286.005388,60.8253625 C286.304957,61.5319793 285.975216,62.3464766 285.270474,62.6453054 C284.608769,62.9264688 283.852354,62.6539022 283.514437,62.0373876 L283.452586,61.9095609 L282.369612,59.3527946 C282.071121,58.6461777 282.400862,57.8316804 283.105603,57.5328516 C283.766298,57.2526996 284.523597,57.5243813 284.861628,58.1416702 Z M289.300216,56.4122976 C289.598707,57.1178356 289.270043,57.9334117 288.564224,58.2322405 C287.859483,58.5310693 287.045905,58.202034 286.746336,57.4954171 C286.447845,56.7898791 286.777586,55.974303 287.482328,55.6754742 C288.187069,55.3766454 289.001724,55.7056807 289.300216,56.4122976 Z M275.985668,12.8914962 C280.477047,17.8906755 285.117134,24.3645804 288.472737,32.2916348 L288.472737,32.2916348 L293.22597,43.5209092 L294.008297,45.3678223 C294.649461,46.880306 293.942565,48.6258114 292.432866,49.2677 C290.921013,49.9085098 289.177478,49.2018929 288.536315,47.6894092 L288.536315,47.6894092 L287.995366,46.4110261 C287.696875,45.7044092 286.883297,45.3753739 286.177478,45.6742027 C285.472737,45.9741103 285.142996,46.7886076 285.442565,47.4952245 L285.442565,47.4952245 L286.818642,50.745662 L287.607435,52.6076783 C287.905927,53.3132163 287.576185,54.1277136 286.871444,54.4265424 C286.166703,54.72645 285.352047,54.3963359 285.053556,53.6907978 L285.053556,53.6907978 L284.280927,51.8654609 C284.067565,51.3616592 283.485668,51.1264799 282.982435,51.3400832 C282.479203,51.5526076 282.244289,52.135162 282.457651,52.6389636 L282.457651,52.6389636 L283.23028,54.4653793 C283.528772,55.1709174 283.200108,55.9854147 282.494289,56.2842435 C281.789547,56.5830723 280.97597,56.254037 280.676401,55.5474201 L280.676401,55.5474201 L279.888685,53.6864826 L278.512608,50.4349663 C278.213039,49.7294283 277.399461,49.3993141 276.69472,49.6992217 C275.988901,49.9980505 275.660237,50.8136266 275.958728,51.5191647 L275.958728,51.5191647 L276.499677,52.7975478 C277.140841,54.3100315 276.435022,56.055537 274.924246,56.6963467 C273.412392,57.3382353 271.668858,56.6305397 271.028772,55.1191348 L271.028772,55.1191348 L270.246444,53.2722217 L265.492134,42.0429473 C262.13653,34.1158929 260.715194,26.2773005 260.250754,19.5682163 L260.250754,19.5682163 Z M278.992134,35.0544527 C278.416703,33.6940804 276.846659,33.0575859 275.486746,33.6347462 C274.126832,34.2119065 273.492134,35.7826457 274.068642,37.1440967 C274.644073,38.5066266 276.214116,39.1409636 277.57403,38.5638033 C278.933944,37.9877217 279.56972,36.4159038 278.992134,35.0544527 Z M276.126832,28.2849554 C275.550323,26.9235043 273.98028,26.2870098 272.620366,26.8652489 C271.26153,27.4424092 270.625754,29.0131484 271.202263,30.3745995 C271.778772,31.7360505 273.347737,32.3714663 274.707651,31.7953848 C276.067565,31.2182245 276.702263,29.6464065 276.126832,28.2849554 Z M273.260453,21.5154582 C272.683944,20.1540071 271.113901,19.5175125 269.755065,20.0946728 C268.394073,20.6718332 267.759375,22.2425723 268.335884,23.6051022 C268.912392,24.9665533 270.481358,25.601969 271.842349,25.0258875 C273.201185,24.4476484 273.836961,22.8758304 273.260453,21.5154582 Z M261.372953,0.297210598 C261.372953,0.297210598 267.403638,3.8945996 274.229768,11.0017845 L274.545366,11.3323003 L260.134806,17.4469633 C259.711315,7.35043342 261.372953,0.297210598 261.372953,0.297210598 Z" id="img-rocket"></path>
+                </g>
+            </g>
+        </g>
+    </g>
+</svg>