Skip to content

Commit 4e26f74

Browse files
sandeshkr419kolchfa-awsnatebower
authored
[Star-tree] Support for nested aggs & Removing experimental flag. (#10132)
* Update star-tree-index.md nested aggs support Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> * remove experimental banner Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> * title capital letter error Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> * remove experimental flag setting Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> * Doc review Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update _search-plugins/star-tree-index.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Incorporate disabling star-tree in enabling section Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Restructure limitations Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Rewrite enabling section Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * More rewording Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Add cross-links Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Minor rewording Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Signed-off-by: Nathan Bower <nbower@amazon.com> * Address technical comments Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> * doc review Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update _search-plugins/star-tree-index.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/star-tree-index.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Nathan Bower <nbower@amazon.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
1 parent 031bdde commit 4e26f74

File tree

2 files changed

+120
-52
lines changed

2 files changed

+120
-52
lines changed

_field-types/supported-field-types/star-tree.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,6 @@ parent: Supported field types
77

88
# Star-tree field type
99

10-
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
11-
{: .warning}
12-
1310
A star-tree index precomputes aggregations, accelerating the performance of aggregation queries.
1411
If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time.
1512

@@ -232,3 +229,6 @@ The `metrics` parameter supports the following properties.
232229

233230
For more information about supported queries and aggregations, see [Supported queries and aggregations for a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-queries-and-aggregations).
234231

232+
## Next steps
233+
234+
- [Star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/)

_search-plugins/star-tree-index.md

Lines changed: 117 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -7,77 +7,143 @@ nav_order: 54
77

88
# Star-tree index
99

10-
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
11-
{: .warning}
10+
A _star-tree index_ is a specialized index structure designed to improve aggregation performance by precomputing and storing aggregated values at different levels of granularity. This indexing technique enables faster aggregation execution, especially for multi-field aggregations.
1211

13-
A star-tree index is a multi-field index that improves the performance of aggregations.
12+
Once you enable star-tree indexes, OpenSearch automatically builds and uses star-tree indexes to optimize supported aggregations if the filter fields match the defined dimensions and the aggregation fields match the defined metrics in the star-tree mapping configuration. No changes to your query syntax or request parameters are required.
1413

15-
OpenSearch will automatically use a star-tree index to optimize aggregations if the queried fields are part of dimension fields and the aggregations are on star-tree metric fields. No changes are required in the query syntax or the request parameters.
16-
17-
## When to use a star-tree index
18-
19-
A star-tree index can be used to perform faster aggregations. Consider the following criteria and features when deciding to use a star-tree index:
14+
Use a star-tree index when you want to speed up aggregations:
2015

2116
- Star-tree indexes natively support multi-field aggregations.
22-
- Star-tree indexes are created in real time as part of the indexing process, so the data in a star-tree will always be up to date.
23-
- A star-tree index consolidates data, increasing index paging efficiency and using less IO for search queries.
24-
25-
## Limitations
17+
- Star-tree indexes are created in real time as part of the indexing process, so the data in a star-tree is always current.
18+
- A star-tree index aggregates data to improve paging efficiency and reduce disk I/O during search queries.
2619

27-
Star-tree indexes have the following limitations:
20+
## Star-tree index structure
2821

29-
- A star-tree index should only be enabled on indexes whose data is not updated or deleted because updates and deletions are not accounted for in a star-tree index. To enforce this policy and use star-tree indexes, set the `index.append_only.enabled` setting to `true`.
30-
- A star-tree index can be used for aggregation queries only if the queried fields are a subset of the star-tree's dimensions and the aggregated fields are a subset of the star-tree's metrics.
31-
- After a star-tree index is enabled, it cannot be disabled. In order to disable a star-tree index, the data in the index must be reindexed without the star-tree mapping. Furthermore, changing a star-tree configuration will also require a reindex operation.
32-
- [Multi-values/array values]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/#arrays) are not supported.
33-
- Only [limited queries and aggregations](#supported-queries-and-aggregations) are supported. Support for more features will be added in future versions.
34-
- The cardinality of the dimensions should not be very high (as with `_id` fields). Higher cardinality leads to increased storage usage and query latency.
22+
A star-tree index organizes and aggregates data across combinations of dimension fields and precomputes metric values for all the dimension combinations every time a segment is flushed or refreshed during ingestion. This structure enables OpenSearch to process aggregation queries quickly without scanning every document.
3523

36-
## Star-tree index structure
24+
The following is an example star-tree configuration:
3725

38-
The following image illustrates a standard star-tree index structure.
26+
```json
27+
"ordered_dimensions": [
28+
{
29+
"name": "status"
30+
},
31+
{
32+
"name": "port"
33+
}
34+
],
35+
"metrics": [
36+
{
37+
"name": "size",
38+
"stats": [
39+
"sum"
40+
]
41+
},
42+
{
43+
"name": "latency",
44+
"stats": [
45+
"avg"
46+
]
47+
}
48+
]
49+
```
3950

40-
<img src="{{site.url}}{{site.baseurl}}/images/star-tree-index.png" alt="A star-tree index containing two dimensions and two metrics" width="700">
51+
This configuration defines the following:
4152

42-
Sorted and aggregated star-tree documents are backed by `doc_values` in an index. The columnar data found in `doc_values` is stored using the following properties:
53+
* Two dimension fields: `status` and `port`. The `ordered_dimension` field specifies how data is sorted (first by `status`, then by `port`).
54+
* Two metric fields: `size` and `latency` with their corresponding aggregations (`sum` and `avg`). For each unique dimension combination, metric values (`Sum(size)` and `Avg(latency)`) are pre-aggregated and stored in the star-tree structure.
4355

44-
- The values are sorted based on the fields set in the `ordered_dimension` setting. In the preceding image, the dimensions are determined by the `status` setting and then by the `port` for each status.
45-
- For each unique dimension/value combination, the aggregated values for all the metrics, such as `avg(size)` and `count(requests)`, are precomputed during ingestion.
56+
OpenSearch creates a star-tree index structure based on this configuration. Each node in the tree corresponds to a value (or wildcard `*`) for a dimension. At query time, OpenSearch traverses the tree based on the dimension values provided in the query.
4657

4758
### Leaf nodes
4859

49-
Each node in a star-tree index points to a range of star-tree documents. Nodes can be further split into child nodes based on the [max_leaf_docs configuration]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/#star-tree-index-configuration-options). The number of documents that a leaf node points to is less than or equal to the value set in `max_leaf_docs`. This ensures that the maximum number of documents that need to traverse nodes to derive an aggregated value is at most the number of `max_leaf_docs`, which provides predictable latency.
60+
Leaf nodes contain the precomputed metric aggregations for specific combinations of dimensions. These are stored as doc values and referenced by star-tree nodes.
61+
62+
The `max_leaf_docs` setting controls how many documents each leaf node can reference, which helps keep query latency predictable by limiting how many documents are scanned for any given node.
5063

5164
### Star nodes
5265

53-
A star node contains the aggregated data of all the other nodes for a particular dimension, acting as a "catch-all" node. When a star node is found in a dimension, that dimension is skipped during aggregation. This groups together all values of that dimension and allows a query to skip non-competitive nodes when fetching the aggregated value of a particular field.
66+
A _star node_ (marked as `*` in the following diagram) aggregates all values for a particular dimension. If a query doesn't specify a filter for that dimension, OpenSearch retrieves the precomputed aggregation from the star node instead of iterating over multiple leaf nodes. For example, if a query filters on `port` but not `status`, OpenSearch can use a star node that aggregates data for all status values.
5467

55-
The star-tree index structure diagram contains the following three examples demonstrating how a query behaves when retrieving aggregations from nodes in the star-tree:
68+
### How queries use the star-tree
5669

57-
- **Blue**: In a `terms` query that searches for the average request size aggregation, the `port` equals `8443` and the status equals `200`. Because the query contains values in both the `status` and `port` dimensions, the query traverses status node `200` and returns the aggregations from child node `8443`.
58-
- **Green**: In a `term` query that searches for the number of aggregation requests, the `status` equals `200`. Because the query only contains a value from the `status` dimension, the query traverses the `200` node's child star node, which contains the aggregated value of all the `port` child nodes.
59-
- **Red**: In a `term` query that searches for the average request size aggregation, the port equals `5600`. Because the query does not contain a value from the `status` dimension, the query traverses a star node and returns the aggregated result from the `5600` child node.
70+
The following diagram shows a star-tree index created for this example and three example query paths. In the diagram, notice that each branch corresponds to a dimension (`status` and `port`). Some nodes contain precomputed aggregation values (for example, `Sum(size)`), allowing OpenSearch to skip unnecessary calculations at query time.
6071

61-
Support for the `Terms` query will be added in a future version. For more information, see [GitHub issue #15257](https://github.com/opensearch-project/OpenSearch/issues/15257).
62-
{: .note}
72+
<img src="{{site.url}}{{site.baseurl}}/images/star-tree-index.png" alt="A star-tree index containing two dimensions and two metrics">
73+
74+
The colored arrows show three query examples:
75+
76+
* **Blue arrow**: Multi-term query with metric aggregation
77+
The query filters on both `status = 200` and `port = 5600` and calculates the sum of request sizes.
78+
79+
* OpenSearch follows this path: `Root → 200 → 5600`
80+
* It retrieves the metric from Doc ID 1, where `Sum(size) = 988`
81+
82+
* **Green arrow**: Single-term query with metric aggregation
83+
The query filters on `status = 200` only and computes the average request latency.
84+
85+
* OpenSearch follows this path: `Root → 200 → *`
86+
* It retrieves the metric from Doc ID 5, where `Avg(latency) = 70`
87+
88+
* **Red arrow**: Single-term query with metric aggregation
89+
The query filters on `port = 8443` only and calculates the sum of request sizes.
90+
91+
* OpenSearch follows this path: `Root → * → 8443`
92+
* It retrieves the metric from Doc ID 7, where `Sum(size) = 1111`
93+
94+
These examples show how OpenSearch selects the shortest path in the star-tree and uses pre-aggregated values to process queries efficiently.
95+
96+
## Limitations
97+
98+
Note the following limiations of star-tree indexes:
99+
100+
- Star-tree indexes do not support updates or deletions. To use a star-tree index, data should be append-only. See [Enabling a star-tree index](#enabling-a-star-tree-index).
101+
- A star-tree index only works for aggregation queries that filter on dimension fields and aggregate metric fields defined in the index's star-tree configuration.
102+
- Any changes to a star-tree configuration require reindexing.
103+
- [Array values]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/#arrays) are not supported.
104+
- Only [specific queries and aggregations](#supported-queries-and-aggregations) are supported.
105+
- Avoid using high-cardinality fields like `_id` as dimensions because they can significantly increase storage use and query latency.
63106

64107
## Enabling a star-tree index
65108

66-
To use a star-tree index, modify the following settings:
109+
Star-tree indexing behavior is controlled by the following cluster-level and index-level settings. Index-level settings take precedence over cluster settings.
110+
111+
| Setting | Scope | Default | Purpose |
112+
| ------------------------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------ |
113+
| `indices.composite_index.star_tree.enabled` | Cluster | `true` | Enables or disables star-tree search optimization across the cluster. |
114+
| `index.composite_index` | Index | None | Enables star-tree indexing for a specific index. Must be set when creating the index. |
115+
| `index.append_only.enabled` | Index | None | Required for star-tree indexes. Prevents updates and deletions. Must be `true`. |
116+
| `index.search.star_tree_index.enabled` | Index | `true` | Enables or disables use of the star-tree index for search queries on the index. |
117+
118+
Setting `indices.composite_index.star_tree.enabled` to `false` prevents OpenSearch from using star-tree optimization during searches, but the star-tree index structures are still created. To completely remove star-tree structures, you must reindex your data without the star-tree mapping.
119+
{: .note}
67120

68-
- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
69-
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [Configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings).
70-
- Set the `index.composite_index` index setting to `true` during index creation.
71-
- Set the `index.append_only.enabled` index setting to `true` during index creation.
72-
- Ensure that the `doc_values` parameter is enabled for the `dimensions` and `metrics` fields used in your star-tree mapping.
73121

122+
To create an index that uses a star-tree index, send the following request:
123+
124+
```json
125+
PUT /logs
126+
{
127+
"settings": {
128+
"index.composite_index": true,
129+
"index.append_only.enabled": true
130+
}
131+
}
132+
```
133+
{% include copy-curl.html %}
134+
135+
Ensure that the `doc_values` parameter is enabled for the dimension and metric fields used in your star-tree mapping. This is enabled by default for most field types. For more information, see [Doc values]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/doc-values/).
136+
137+
### Disabling star-tree usage
138+
139+
By default, both the `indices.composite_index.star_tree.enabled` cluster setting and the `index.search.star_tree_index.enabled` index setting are set to `true`. To disable search using star-tree indexes, set both of these settings to `false`. Note that index settings take precedence over cluster settings.
74140

75141
## Example mapping
76142

77-
In the following example, index mappings define the star-tree configuration. The star-tree index precomputes aggregations in the `logs` index. The aggregations are calculated on the `size` and `latency` fields for all the combinations of values indexed in the `port` and `status` fields:
143+
The following example shows how to create a star-tree index that precomputes aggregations in the `logs` index. The `sum` and `average` aggregations are calculated on the `size` and `latency` fields , respectively, for all combinations of values in the dimension fields. The dimensions are ordered by `status`, then `port`, and finally `method`, which determines how the data is organized in the tree structure:
78144

79145
```json
80-
PUT logs
146+
PUT /logs
81147
{
82148
"settings": {
83149
"index.number_of_shards": 1,
@@ -148,24 +214,22 @@ PUT logs
148214
```
149215
{% include copy.html %}
150216

151-
For detailed information about star-tree index mappings and parameters, see [Star-tree field type]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/).
217+
For more information about star-tree index mappings and parameters, see [Star-tree field type]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/).
152218

153219
## Supported queries and aggregations
154220

155-
Star-tree indexes can be used to optimize queries and aggregations.
221+
Star-tree indexes optimize aggregations. Every query must include at least one supported aggregation in order to use the star-tree optimization.
156222

157223
### Supported queries
158224

159-
The following queries are supported as of OpenSearch 2.19:
225+
Queries without aggregations cannot use star-tree optimization. The query's fields must be present in the `ordered_dimensions` section of the star-tree configuration. The following queries are supported:
160226

161227
- [Term query]({{site.url}}{{site.baseurl}}/query-dsl/term/term/)
162228
- [Terms query]({{site.url}}{{site.baseurl}}/query-dsl/term/terms/)
163229
- [Match all docs query]({{site.url}}{{site.baseurl}}/query-dsl/match-all/)
164230
- [Range query]({{site.url}}{{site.baseurl}}/query-dsl/term/range/)
165231
- [Boolean query]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/)
166232

167-
To use a query with a star-tree index, the query's fields must be present in the `ordered_dimensions` section of the star-tree configuration. Queries must also be paired with a supported aggregation. Queries without aggregations cannot be used with a star-tree index. Currently, queries on `date` fields are not supported and will be added in later versions.
168-
169233
#### Boolean query restrictions
170234

171235
Boolean queries in star-tree indexes follow specific rules for each clause type:
@@ -240,7 +304,7 @@ The following aggregations are supported by star-tree indexes.
240304

241305
#### Metric aggregations
242306

243-
The following metric aggregations are supported as of OpenSearch 2.18:
307+
The following metric aggregations are supported:
244308

245309
- [Sum]({{site.url}}{{site.baseurl}}/aggregations/metric/sum/)
246310
- [Minimum]({{site.url}}{{site.baseurl}}/aggregations/metric/minimum/)
@@ -428,6 +492,10 @@ POST /sales/_search
428492
```
429493
{% include copy-curl.html %}
430494

431-
## Using queries without a star-tree index
495+
#### Nested aggregations
496+
497+
You can combine multiple supported bucket aggregations (such as `terms` and `range`) in a nested structure, and the star-tree index will optimize these nested aggregations. For more information about nested aggregations, see [Nested aggregations]({{site.url}}{{site.baseurl}}/aggregations/#nested-aggregations).
498+
499+
## Next steps
432500

433-
Set the `indices.composite_index.star_tree.enabled` setting to `false` to run queries without using a star-tree index.
501+
- [Star-tree field type]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/)

0 commit comments

Comments
 (0)