Skip to content

Commit e6b25a8

Browse files
github-actions[bot]kolchfa-awsnatebower
committed
adding rewrite parameter docs (#9954)
* adding rewrite parameter docs Signed-off-by: Anton Rubin <anton.rubin@eliatra.com> * addressing the PR comments Signed-off-by: Anton Rubin <anton.rubin@eliatra.com> * addressing comments Signed-off-by: Anton Rubin <anton.rubin@eliatra.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: AntonEliatra <anton.rubin@eliatra.com> * addressing the comments Signed-off-by: Anton Rubin <anton.rubin@eliatra.com> * addressing the comments Signed-off-by: Anton Rubin <anton.rubin@eliatra.com> * Update _query-dsl/rewrite-parameter.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * updating links Signed-off-by: Anton Rubin <anton.rubin@eliatra.com> * Apply suggestions from code review Signed-off-by: Nathan Bower <nbower@amazon.com> --------- Signed-off-by: Anton Rubin <anton.rubin@eliatra.com> Signed-off-by: AntonEliatra <anton.rubin@eliatra.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Nathan Bower <nbower@amazon.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com> (cherry picked from commit 767d672) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent 423ce8a commit e6b25a8

File tree

1 file changed

+253
-0
lines changed

1 file changed

+253
-0
lines changed

_query-dsl/rewrite-parameter.md

Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
---
2+
layout: default
3+
title: Rewrite
4+
nav_order: 80
5+
---
6+
7+
# Rewrite
8+
9+
Multi-term queries like `wildcard`, `prefix`, `regexp`, `fuzzy`, and `range` expand internally into sets of terms. The `rewrite` parameter allows you to control how these term expansions are executed and scored.
10+
11+
When a multi-term query expands into many terms (for example, `prefix: "error*"` matching hundreds of terms), they are internally converted into `term` queries. This process can have the following drawbacks:
12+
13+
* Exceed the `indices.query.bool.max_clause_count` limit (default is `1024`).
14+
* Affect how scores are calculated for matching documents.
15+
* Impact memory and latency depending on the rewrite method used.
16+
17+
The `rewrite` parameter gives you control over how multi-term queries behave internally.
18+
19+
| Mode | Scores | Performance | Notes |
20+
| --------------------------- | -------------------------------------- | ----------- | --------------------------------------------- |
21+
| `constant_score` | Same score for all matches | Best | Default mode, ideal for filters |
22+
| `scoring_boolean` | TF/IDF-based | Moderate | Full relevance scoring |
23+
| `constant_score_boolean` | Same score but with Boolean structure | Moderate | Use with `must_not` or `minimum_should_match` |
24+
| `top_terms_N` | TF/IDF on top N terms | Efficient | Truncates expansion |
25+
| `top_terms_boost_N` | Static boosts | Fast | Less accurate |
26+
| `top_terms_blended_freqs_N` | Blended score | Balanced | Best scoring/efficiency trade-off |
27+
28+
29+
## Available rewrite methods
30+
31+
The following table summarizes the available rewrite methods.
32+
33+
| Rewrite method | Description |
34+
| [`constant_score`](#constant-score) | (Default) All expanded terms are evaluated together as a single unit, assigning the same score to every match. Matching documents are not scored individually, making it very efficient for filtering use cases. |
35+
| [`scoring_boolean`](#scoring-boolean) | Breaks the query into a Boolean `should` clause with one term query per match. Each result is scored individually based on relevance. |
36+
| [`constant_score_boolean`](#constant-score-boolean) | Similar to `scoring_boolean`, but all documents receive a fixed score regardless of term frequency. Maintains Boolean structure without TF/IDF weighting. |
37+
| [`top_terms_N`](#top-terms-n) | Restricts scoring and execution to the N most frequent terms. Reduces resource usage and prevents clause overload. |
38+
| [`top_terms_boost_N`](#top-terms-boost-n) | Like `top_terms_N` but uses static boosting instead of full scoring. Offers performance improvements with simplified relevance. |
39+
| [`top_terms_blended_freqs_N`](#top-terms-blended-frequencies-n) | Chooses the top N matching terms and averages their document frequencies for scoring. Produces balanced scores without full term explosion. |
40+
41+
## Boolean-based rewrite limits
42+
43+
All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`, and `top_terms_*`, are subject to the following [dynamic cluster-level index settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/#dynamic-cluster-level-index-settings):
44+
45+
```json
46+
indices.query.bool.max_clause_count
47+
```
48+
49+
This setting controls the maximum number of allowed Boolean `should` clauses (default: `1024`). If your query expands to a number of terms greater than this limit, it is rejected with a `too_many_clauses` error.
50+
51+
For example, a wildcard, such as "error*", might expand to hundreds or thousands of matching terms, which could include "error", "errors", "error_log", "error404", and others. Each of these terms turns into a separate `term` query. If the number of terms exceeds the `indices.query.bool.max_clause_count` limit, the query fails:
52+
53+
```json
54+
POST /logs/_search
55+
{
56+
"query": {
57+
"wildcard": {
58+
"message": {
59+
"value": "error*",
60+
"rewrite": "scoring_boolean"
61+
}
62+
}
63+
}
64+
}
65+
```
66+
{% include copy-curl.html %}
67+
68+
The query is expanded internally as follows:
69+
70+
```json
71+
{
72+
"bool": {
73+
"should": [
74+
{ "term": { "message": "error" } },
75+
{ "term": { "message": "errors" } },
76+
{ "term": { "message": "error_log" } },
77+
{ "term": { "message": "error404" } },
78+
...
79+
]
80+
}
81+
}
82+
```
83+
84+
## Constant score
85+
86+
The default `constant_score` rewrite method wraps all expanded terms into a single query and skips the scoring phase entirely. This approach has the following characteristics:
87+
88+
* Executes all term matches as a single [bit array](https://en.wikipedia.org/wiki/Bit_array) query.
89+
* Ignores scoring altogether; every document gets a `_score` of `1.0`.
90+
* Fastest option; ideal for filtering.
91+
92+
The following example runs a `wildcard` query using the default `constant_score` rewrite method to efficiently filter documents matching the pattern `warning*` in the `message` field:
93+
94+
```json
95+
POST /logs/_search
96+
{
97+
"query": {
98+
"wildcard": {
99+
"message": {
100+
"value": "warning*"
101+
}
102+
}
103+
}
104+
}
105+
```
106+
{% include copy-curl.html %}
107+
108+
## Scoring Boolean
109+
110+
The `scoring_boolean` rewrite method breaks the expanded terms into separate `term` queries combined under a Boolean `should` clause. This approach works as follows:
111+
112+
* Expands the wildcard into individual `term` queries inside a Boolean `should` clause.
113+
* Each document's score reflects how many terms it matches and the terms' frequency.
114+
115+
The following example uses a `scoring_boolean` rewrite configuration:
116+
117+
```json
118+
POST /logs/_search
119+
{
120+
"query": {
121+
"wildcard": {
122+
"message": {
123+
"value": "warning*",
124+
"rewrite": "scoring_boolean"
125+
}
126+
}
127+
}
128+
}
129+
```
130+
{% include copy-curl.html %}
131+
132+
## Constant score Boolean
133+
134+
The `constant_score_boolean` rewrite method uses the same Boolean structure as `scoring_boolean` but disables scoring, making it useful when clause logic is needed without relevance ranking. This method has the following characteristics:
135+
136+
* Similar structure to `scoring_boolean`, but documents are not ranked.
137+
* All matching documents receive the same score.
138+
* Retains Boolean clause flexibility, such as using `must_not`, without ranking.
139+
140+
The following example query uses a `must_not` Boolean clause:
141+
142+
```json
143+
POST /logs/_search
144+
{
145+
"query": {
146+
"bool": {
147+
"must_not": {
148+
"wildcard": {
149+
"message": {
150+
"value": "error*",
151+
"rewrite": "constant_score_boolean"
152+
}
153+
}
154+
}
155+
}
156+
}
157+
}
158+
```
159+
{% include copy-curl.html %}
160+
161+
This query is internally expanded as follows:
162+
163+
```json
164+
{
165+
"bool": {
166+
"must_not": {
167+
"bool": {
168+
"should": [
169+
{ "term": { "message": "error" } },
170+
{ "term": { "message": "errors" } },
171+
{ "term": { "message": "error_log" } },
172+
...
173+
]
174+
}
175+
}
176+
}
177+
}
178+
```
179+
180+
## Top terms N
181+
182+
The `top_terms_N` method is one of several rewrite options designed to balance scoring accuracy and performance when expanding multi-term queries. It works as follows:
183+
184+
* Only the N most frequently matching terms are selected and scored.
185+
* Useful when you expect a large term expansion and want to limit the load.
186+
* Other valid terms are ignored to preserve performance.
187+
188+
The following query uses the `top_terms_2` rewrite method to score only the two most frequent terms that match the `warning*` pattern in the `message` field:
189+
190+
```json
191+
POST /logs/_search
192+
{
193+
"query": {
194+
"wildcard": {
195+
"message": {
196+
"value": "warning*",
197+
"rewrite": "top_terms_2"
198+
}
199+
}
200+
}
201+
}
202+
```
203+
{% include copy-curl.html %}
204+
205+
## Top terms boost N
206+
207+
The `top_terms_boost_N` rewrite method selects the top N matching terms and applies static `boost` values instead of computing full relevance scores. It works as follows:
208+
209+
* Limits expansion to the top N terms like `top_terms_N`.
210+
* Rather than computing TF/IDF, it assigns a preset boost to each term.
211+
* Provides faster execution with predictable relevance weights.
212+
213+
The following example uses a `top_terms_boost_2` rewrite parameter:
214+
215+
```json
216+
POST /logs/_search
217+
{
218+
"query": {
219+
"wildcard": {
220+
"message": {
221+
"value": "warning*",
222+
"rewrite": "top_terms_boost_2"
223+
}
224+
}
225+
}
226+
}
227+
```
228+
{% include copy-curl.html %}
229+
230+
## Top terms blended frequencies N
231+
232+
The `top_terms_blended_freqs_N` rewrite method selects the top N matching terms and blends their document frequencies to produce more balanced relevance scores. This approach has the following characteristics:
233+
234+
* Picks the top N matching terms and applies a blended frequency to all.
235+
* Blending makes scoring smoother across terms that differ in frequency.
236+
* Good trade-off when you want performance with realistic scoring.
237+
238+
The following example uses a `top_terms_blended_freqs_2` rewrite parameter:
239+
240+
```json
241+
POST /logs/_search
242+
{
243+
"query": {
244+
"wildcard": {
245+
"message": {
246+
"value": "warning*",
247+
"rewrite": "top_terms_blended_freqs_2"
248+
}
249+
}
250+
}
251+
}
252+
```
253+
{% include copy-curl.html %}

0 commit comments

Comments
 (0)