-
Notifications
You must be signed in to change notification settings - Fork 578
adding rewrite parameter docs #9954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
a99b93a
d2cb7f1
266c9f9
3ada38c
2cb2162
c980798
71bfa3e
60878b3
b9fb6e7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,247 @@ | ||
--- | ||
layout: default | ||
title: Rewrite | ||
nav_order: 80 | ||
--- | ||
|
||
# Rewrite | ||
|
||
Multi-term queries like `wildcard`, `prefix`, `regexp`, `fuzzy`, and `range` expand internally into sets of terms. The `rewrite` parameter allows you to control how these term expansions are executed and scored. | ||
|
||
When a multi-term query expands into many terms (for example `prefix: "error*"` matching hundreds of terms), they are converted into actual term queries internally. This process can: | ||
|
||
* Exceed the `indices.query.bool.max_clause_count` limit (default `1024`) | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Affect how scores are calculated for matching documents | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Impact memory and latency depending on the rewrite method used | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Available rewrite methods | ||
|
||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Rewrite method | Description | | ||
| [`constant_score`](#constant_score-default) | (Default) All expanded terms are evaluated together as a single unit, assigning the same score to every match, matching documents are not scored individually. Making it very efficient for filtering use cases | | ||
Check failure on line 20 in _query-dsl/rewrite-parameter.md
|
||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| [`scoring_boolean`](#scoring_boolean) | Breaks the query into a Boolean `should` clause with one term query per match. Each result is scored individually based on relevance. | | ||
| [`constant_score_boolean`](#constant_score_boolean) | Similar to `scoring_boolean`, but all documents receive a fixed score regardless of term frequency. Maintains Boolean structure without TF/IDF weighting. | | ||
| [`top_terms_N`](#top_terms_N) | Restricts scoring and execution to the N most frequent terms. Reduces resource usage and prevents clause overload. | | ||
| [`top_terms_boost_N`](#top_terms_boost_N) | Like `top_terms_N`, but uses static boosting instead of full scoring. Offers performance improvements with simplified relevance. | | ||
| [`top_terms_blended_freqs_N`](#top_terms_blended_freqs_N) | Chooses the top N matching terms and averages their document frequencies for scoring. Produces balanced scores without full term explosion. | | ||
|
||
## Boolean-based rewrite limits | ||
|
||
All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`, and `top_terms_*`, are subject to the following configuration: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a dynamic setting that can be updated using the API or a static? Please link to the static/dynamic settings section in the install and configure topic so the users know how to update the setting. |
||
|
||
```json | ||
indices.query.bool.max_clause_count | ||
``` | ||
|
||
This setting controls the maximum number of allowed Boolean `should` clauses (default: `1024`). If your query expands beyond this limit, it will be rejected with a `too_many_clauses` error. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
For example, a wildcard, such as "error*", might expand to hundreds or thousands of matching terms, which could include: "error", "errors", "error_log", "error404", and others. Each of these terms turns into a separate term query. If the number of terms exceed the `indices.query.bool.max_clause_count` limit, the query fails. See following example: | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "error*", | ||
"rewrite": "scoring_boolean" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Query is expanded internally as follows: | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
{ | ||
"bool": { | ||
"should": [ | ||
{ "term": { "message": "error" } }, | ||
{ "term": { "message": "errors" } }, | ||
{ "term": { "message": "error_log" } }, | ||
{ "term": { "message": "error404" } }, | ||
... | ||
] | ||
} | ||
} | ||
``` | ||
|
||
## constant_score (default) | ||
Check failure on line 70 in _query-dsl/rewrite-parameter.md
|
||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The constant_score rewrite method wraps all expanded terms into a single query and skips the scoring phase entirely. This approach offers the following characteristics: | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* Executes all term matches as a single [bit array](https://en.wikipedia.org/wiki/Bit_array) query. | ||
* Ignores scoring altogether; every document gets `_score = 1.0`. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Fastest option; ideal when filtering is the goal. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
natebower marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## scoring_boolean | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The `scoring_boolean` rewrite method breaks the expanded terms into separate `term` queries combined under a Boolean `should` clause. This approach works as follows: | ||
|
||
* Expands the wildcard into individual `term` queries inside a Boolean `should` clause. | ||
* Each document’s score reflects how many terms it matches and their term frequency. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The following example is using `scoring_boolean` rewrite configuration: | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "scoring_boolean" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## constant_score_boolean | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The `constant_score_boolean` rewrite method uses the same Boolean structure as `scoring_boolean` but disables scoring, making it useful when clause logic is needed without relevance ranking. This method offers the following characteristics: | ||
natebower marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* Similar structure to `scoring_boolean`, but documents are not ranked. | ||
* All matching docs receive the same score. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* This retains Boolean clause flexibility, such as using `must_not`, without ranking. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
See the following example query using `must_not`: | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"bool": { | ||
"must_not": { | ||
"wildcard": { | ||
"message": { | ||
"value": "error*", | ||
"rewrite": "constant_score_boolean" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
This query is internally expanded as follows: | ||
|
||
```json | ||
{ | ||
"bool": { | ||
"must_not": { | ||
"bool": { | ||
"should": [ | ||
{ "term": { "message": "error" } }, | ||
{ "term": { "message": "errors" } }, | ||
{ "term": { "message": "error_log" } }, | ||
... | ||
] | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## top_terms_N | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Only the N most frequent matching terms are selected and scored. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Useful when you expect large expansion and want to limit load. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Other valid terms are ignored to preserve performance. | ||
|
||
```json | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "top_terms_2" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## top_terms_boost_N | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The `top_terms_boost_N` rewrite method selects the top N matching terms and applies static `boost` values instead of computing full relevance scores. It works as follows: | ||
|
||
* Limits expansion to the top N terms like `top_terms_N`. | ||
* Rather than computing TF/IDF, it assigns a pre-set boost per term. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Provides faster execution with predictable relevance weights. | ||
|
||
See the following example query using `top_terms_boost_2` rewrite parameter: | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "top_terms_boost_2" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## top_terms_blended_freqs_N | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The `top_terms_blended_freqs_N` rewrite method selects the top N matching terms and blends their document frequencies to produce more balanced relevance scores. This approach offers the following characteristics: | ||
natebower marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* Picks the top N matching terms and applies a blended frequency to all. | ||
* Blending makes scoring smoother across terms that differ in frequency. | ||
* Good trade-off when you want performance with realistic scoring. | ||
|
||
See the following example query using `top_terms_blended_freqs_2` rewrite parameter: | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "top_terms_blended_freqs_2" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Summary | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The `rewrite` parameter gives you control over how multi-term queries behave under the hood. | ||
|
||
| Mode | Scores | Performance | Notes | | ||
| --------------------------- | -------------------------------------- | ----------- | --------------------------------------------- | | ||
| `constant_score` | Same score for all matches | Best | Default mode, ideal for filters | | ||
| `scoring_boolean` | TF/IDF-based | Moderate | Full relevance scoring | | ||
| `constant_score_boolean` | Same score, but with Boolean structure | Moderate | Use with `must_not` or `minimum_should_match` | | ||
| `top_terms_N` | TF/IDF on top N terms | Efficient | Truncates expansion | | ||
| `top_terms_boost_N` | Static boosts | Fast | Less accurate | | ||
| `top_terms_blended_freqs_N` | Blended score | Balanced | Best scoring/efficiency trade-off | | ||
|
Uh oh!
There was an error while loading. Please reload this page.