some 2.9 features are back-ported to 2.8.2

jovezhong · jovezhong · commit a9f5a61030fb · 2025-06-25T19:55:57.000+08:00
diff --git a/docs/datatypes.md b/docs/datatypes.md
@@ -9,7 +9,7 @@ Like many analytics systems, the following common types are supported.
 |                               | float      | -3.1415                              | default with 4 bytes. Same as `float32`. You can also use `float64` or `double` for 8 bytes. No `float8` or `float16`. | [to_float](/functions_for_type#to_float)                      |
 | Boolean Type                  | bool       | true                                 | true or false                                                |                                                              |
 | String Type                   | string     | 'Hello'                              | strings of an arbitrary length. You can also use `varchar` To create string columns with fixed size in bytes, use `fixed_string(positiveInt)` | [to_string](/functions_for_type#to_string), [etc.](/functions_for_text) |
-| JSON Type                   | json     | '\{"a":1,"b":["x","y"]\}'                              | New in Timeplus Enterprise 2.9. The JSON document is stored in a more optimized, columnar-like layout to improve query performance. |
+| JSON Type                   | json     | '\{"a":1,"b":["x","y"]\}'                              | New in Timeplus Enterprise 2.9 (also available in 2.8.2 or above). The JSON document is stored in a more optimized, columnar-like layout to improve query performance. |
 | Universally Unique Identifier | uuid       | 1f71acbf-59fc-427d-a634-1679b48029a9 | a universally unique identifier (UUID) is a 16-byte number used to identify records. For detailed information about the UUID, see [Wikipedia](https://en.wikipedia.org/wiki/Universally_unique_identifier) | [uuid](/functions_for_text#uuid)                              |
 | IP address                    | ipv4       | '116.253.40.133'                     | IPv4 addresses. Stored in 4 bytes as uint32.                 | [to_ipv4](/functions_for_url#to_ipv4)                         |
 |                               | ipv6       | '2a02:aa08:e000:3100::2'             | IPv6 addresses. Stored in 16 bytes as uint128.               | [to_ipv6](/functions_for_url#to_ipv6)                         |
diff --git a/docs/functions_for_json.md b/docs/functions_for_json.md
@@ -63,3 +63,9 @@ This takes one or more parameters and return a json string. You can also turn al
 This function is available since Timeplus Enterprise v2.9.
 
 This takes one or more parameters and return a json object. You can also turn all column values in the row as a json object via `json_cast(*)`.
+
+### json_array_length
+Get the length of the JSON array. For example, `json_array_length('[3,4,5]')` will return `3`.
+
+### json_merge_patch
+Merge multiple JSON documents into one. For example, `json_merge_patch('{"a":1,"b":2}', '{"b":3,"c":4}')` will return `{"a":1,"b":3,"c":4}`. If the key exists in both documents, the value from the second document will overwrite the first one.
diff --git a/docs/functions_for_text.md b/docs/functions_for_text.md
@@ -161,6 +161,10 @@ Alias of [generate_uuidv4](#generate_uuidv4).
 
 Generates a universally unique identifier (UUIDv4) which is a 16-byte number used to identify records.
 
+### uuid7
+
+Alias of [generate_uuidv7](#generate_uuidv7).
+
 ### generate_uuidv7
 
 `generate_uuidv7()` Generates a universally unique identifier (UUIDv7), which contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit) to distinguish UUIDs within a millisecond (including a variant field "2", 2 bit), and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value.
diff --git a/docs/http-external.md b/docs/http-external.md
@@ -1,6 +1,6 @@
 # HTTP External Stream
 
-Since [Timeplus Enterprise v2.9](/enterprise-v2.9), Timeplus can send data to HTTP endpoints via the HTTP External Stream. You can use this feature to trigger Slack notifications or send streaming data to downstream systems, such as Splunk, Elasticsearch, or any other HTTP-based service.
+Since Timeplus Enterprise [v2.9](/enterprise-v2.9) and v2.8.2, you can send data to HTTP endpoints via the HTTP External Stream. You can use this feature to trigger Slack notifications or send streaming data to downstream systems, such as Splunk, Datadog, Elasticsearch, Databricks, or any other HTTP-based service.
 
 Currently, it only supports writing data to HTTP endpoints, but reading data from HTTP endpoints is not supported yet.
 
@@ -169,6 +169,40 @@ Then you can insert data via a materialized view or just via `INSERT` command:
 INSERT INTO http_bigquery_t1 VALUES(10,'A'),(11,'B');
 ```
 
+#### Write to Databricks {#example-write-to-databricks}
+
+Follow [the guide](https://docs.databricks.com/aws/en/dev-tools/auth/pat) to create an access token for your Databricks workspace.
+
+Assume you have created a table in Databricks SQL warehouse with 2 columns:
+```sql
+CREATE TABLE sales (
+  product STRING,
+  quantity INT
+);
+```
+
+Create the HTTP external stream in Timeplus:
+```sql
+CREATE EXTERNAL STREAM http_databricks_t1 (product string, quantity int)
+SETTINGS
+type = 'http',
+http_header_Authorization='Bearer $TOKEN',
+url = 'https://$HOST.cloud.databricks.com/api/2.0/sql/statements/',
+data_format = 'Template',
+format_template_resultset_format='{"warehouse_id":"$WAREHOUSE_ID","statement": "INSERT INTO sales (product, quantity) VALUES (:product, :quantity)", "parameters": [${data}]}',
+format_template_row_format='{ "name": "product", "value": ${product:JSON}, "type": "STRING" },{ "name": "quantity", "value": ${quantity:JSON}, "type": "INT" }',
+format_template_rows_between_delimiter=''
+```
+
+Replace the `TOKEN`, `HOST`, and `WAREHOUSE_ID` to match your Databricks settings. Also change `format_template_row_format` and `format_template_row_format` to match the table schema.
+
+Then you can insert data via a materialized view or just via `INSERT` command:
+```sql
+INSERT INTO http_databricks_t1(product, quantity) VALUES('test',95);
+```
+
+This will insert one row per request. We plan to support batch insert and Databricks specific format to support different table schemas in the future.
+
 #### Trigger Slack Notifications {#example-trigger-slack}
 
 You can follow [the guide](https://api.slack.com/messaging/webhooks) to configure an "incoming webhook" to send notifications to a Slack channel.
diff --git a/docs/mongo-external.md b/docs/mongo-external.md
@@ -79,6 +79,14 @@ MongoDB connection string options as a URL formatted string. e.g. 'authSource=ad
 #### oid_columns
 A comma-separated list of columns that should be treated as oid in the `WHERE` clause. Default to `_id`.
 
+### Query Settings
+
+#### mongodb_throw_on_unsupported_query
+By default this setting is `true`. While querying the MongoDB external table with SQL, if the query contains `GROUP BY`, `HAVING` or other aggregations, Timeplus will throw exceptions. Set this to `false` or `0` to disable this behavior, and Timeplus will read full table data from MongoDB and execute the query in Timeplus. For example:
+```sql
+SELECT name, COUNT(*) AS cnt FROM mongodb_ext_table GROUP BY name HAVING cnt >5 SETTINGS mongodb_throw_on_unsupported_query = false;
+```
+
 ## DROP EXTERNAL TABLE
 
 ```sql
diff --git a/docs/mutable-stream.md b/docs/mutable-stream.md
@@ -22,14 +22,19 @@ PRIMARY KEY (col1, col2)
 SETTINGS
     logstore_retention_bytes=..,
     logstore_retention_ms=..,
-    shards=..
+    shards=..,
+    version_column=..,
+    coalesced=..,
+    ttl_seconds=..
 ```
 
 Since Timeplus Enterprise 2.7, if you create a mutable stream with `low_cardinality` columns, the system will ignore the `low_cardinality` modifier to improve performance.
 [Learn more](/sql-create-mutable-stream).
 
 `PARTITION BY`, `ORDER BY` or `SAMPLE BY` clauses are not allowed while creating the mutable stream.
 
+Since Timeplus Enterprise 2.8.2, you can set `coalesced` (default to false). If it's true and the insert data only contains partial columns in the WAL, the partial columns will merge with the existing rows.[Learn more](#coalesced). Also, since 2.8.2, you can set `ttl_seconds` (default to -1). If it's set to a positive value, then data with primary key older than the `ttl_seconds` will be scheduled to be pruned in the next key compaction cycle. [Learn more](#ttl_seconds).
+
 ## INSERT
 You can insert data to the mutable stream with the following SQL:
 ```sql
@@ -184,7 +189,7 @@ Mutable stream can also be used in [JOINs](/joins).
 ### Retention Policy for Historical Storage{#ttl_seconds}
 Like normal streams in Timeplus, mutable streams use both streaming storage and historical storage. New data are added to the streaming storage first, then continuously write to the historical data with deduplication/merging process.
 
-Starting from Timeplus Enterprise 2.9, you can set `ttl_seconds` on mutable streams. If the data is older than this value, it is scheduled to be pruned in the next key compaction cycle. Default value is -1. Any value less than 0 means this feature is disabled.
+Starting from Timeplus Enterprise 2.9 (also backported to 2.8.2), you can set `ttl_seconds` on mutable streams. If the data is older than this value, it is scheduled to be pruned in the next key compaction cycle. Default value is -1. Any value less than 0 means this feature is disabled.
 
 ```sql
 CREATE MUTABLE STREAM ..
@@ -280,6 +285,36 @@ PRIMARY KEY (device_id, timestamp, batch_id)
 SETTINGS shards=3
 ```
 
+### Coalesced and Versioned Mutable Stream {#coalesced}
+For a mutable stream with many columns, there are some cases that only some columns are updated over time. Create a mutable stream with `coalesced=true` setting to enable the partial merge. For example, given a mutable stream:
+```sql
+create mutable stream kv_99061_1 (
+       p string, m1 int, m2 int, m3 int, v uint64,
+       family cf1(m1),
+       family cf2(m2),
+       family cf3(m3),
+       family cf4(_tp_time)
+) primary key p
+settings coalesced = true;
+```
+If we insert one row with `m1=1`:
+```sql
+insert into kv_99061_1 (p, m1, _tp_time) values ('p1', 1, '2025-01-01T00:00:01');
+```
+Query the mutable stream. You will get one row.
+
+Then insert the other row with the same primary key and `m2=2`.
+```sql
+insert into kv_99061_1 (p, m2, _tp_time) values ('p1', 2, '2025-01-01T00:00:02');
+```
+Query it again with
+```sql
+select * from table(kv_99061_1);
+```
+You will see one row with m1 and m2 updated and other columns in the default vaule.
+
+Compared to the [Versioned Stream](versioned-stream), coalesced mutable streams don't require you to set all column values when you update a primary key. You can also set `version_column` to the column name to indicate which column with the verison number. Say there are updates for the same primary key, `v` as the `version_column`, the first update is "v=1,p=1,m=1" and the second update is "v=2,p=1,m=2". For some reasons, if Timeplus receives the second update first, then when it gets the "v=1,p=1,m=1", since the version is 1, lower than the current version, so this update will be reject and we keep the latest update as "v=2,p=1,m=2". This is beneficial specialy in distributed environment with potential out of order events.
+
 ## Performance Tuning {#tuning}
 If you are facing performance challenges with massive data in mutable streams, please consider adding [secondary indexes](#index), [column families](#column_family) and use [multiple shards](#shards).
 
diff --git a/docs/sql-alter-stream.md b/docs/sql-alter-stream.md
@@ -12,7 +12,7 @@ to_datetime(created_at) + INTERVAL 48 HOUR
 ```
 
 ## MODIFY SETTING{#modify_setting}
-You can add or modify the retention policy for streaming storage. e.g.
+You can add or modify the retention policy for streams or mutable streams, e.g.
 
 ```sql
 ALTER STREAM stream_name MODIFY SETTING
@@ -32,6 +32,11 @@ You can also change the codec for mutable streams. e.g.
 ALTER STREAM test MODIFY SETTING logstore_codec='lz4';
 ```
 
+Starting from Timeplus Enterprise 2.8.2, you can also modify the TTL for mutable stream.
+```sql
+ALTER STREAM test MODIFY SETTING ttl_seconds = 10;
+```
+
 ## MODIFY QUERY SETTING
 
 :::info
@@ -69,6 +74,11 @@ Syntax:
 ALTER STREAM stream_name ADD COLUMN column_name data_type
 ```
 
+Since Timeplus Enterprise 2.8.2, you can also add multiple columns at once:
+```sql
+ALTER STREAM stream_99005 ADD COLUMN e int, ADD COLUMN f int;
+```
+
 `DELETE COLUMN` is not supported yet. Contact us if you have strong use cases.
 
 ## RENAME COLUMN
@@ -92,6 +102,28 @@ Since Timeplus Enterprise v2.9.0, you can drop an index from a mutable stream.
 ALTER STREAM mutable_stream DROP INDEX index_name
 ```
 
+## MATERIALIZE INDEX
+Since Timeplus Enterprise 2.8.2, you can rebuild the secondary index `name` for the specified `partition_name`.
+```sql
+ALTER STREAM mutable_stream MATERIALIZE INDEX [IF EXISTS] name [IN PARTITION partition_name] SETTINGS mutations_sync = 2"
+```
+
+For example:
+```sql
+ALTER STREAM minmax_idx MATERIALIZE INDEX idx IN PARTITION 2 SETTINGS mutations_sync = 2
+```
+
+## CLEAR INDEX
+Since Timeplus Enterprise 2.8.2, you can delete the secondary index `name` from disk.
+```sql
+ALTER STREAM mutable_stream CLEAR INDEX [IF EXISTS] name [IN PARTITION partition_name] SETTINGS mutations_sync = 2"
+```
+
+For example:
+```sql
+ALTER STREAM minmax_idx CLEAR INDEX idx IN PARTITION 2 SETTINGS mutations_sync = 2
+```
+
 ## DROP PARTITION
 You can delete some data in the stream without dropping the entire stream via `ALTER STREAM .. DROP PARTITION ..`.