-
Notifications
You must be signed in to change notification settings - Fork 580
adding rewrite parameter docs #9954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
natebower
merged 9 commits into
opensearch-project:main
from
AntonEliatra:adding-rewrite-parameter-docs
Jul 9, 2025
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
a99b93a
adding rewrite parameter docs
AntonEliatra d2cb7f1
addressing the PR comments
AntonEliatra 266c9f9
addressing comments
AntonEliatra 3ada38c
Apply suggestions from code review
AntonEliatra 2cb2162
addressing the comments
AntonEliatra c980798
addressing the comments
AntonEliatra 71bfa3e
Update _query-dsl/rewrite-parameter.md
kolchfa-aws 60878b3
updating links
AntonEliatra b9fb6e7
Apply suggestions from code review
natebower File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,253 @@ | ||
--- | ||
layout: default | ||
title: Rewrite | ||
nav_order: 80 | ||
--- | ||
|
||
# Rewrite | ||
|
||
Multi-term queries like `wildcard`, `prefix`, `regexp`, `fuzzy`, and `range` expand internally into sets of terms. The `rewrite` parameter allows you to control how these term expansions are executed and scored. | ||
|
||
When a multi-term query expands into many terms (for example, `prefix: "error*"` matching hundreds of terms), they are internally converted into `term` queries. This process can have the following drawbacks: | ||
|
||
* Exceed the `indices.query.bool.max_clause_count` limit (default is `1024`). | ||
* Affect how scores are calculated for matching documents. | ||
* Impact memory and latency depending on the rewrite method used. | ||
|
||
The `rewrite` parameter gives you control over how multi-term queries behave internally. | ||
|
||
| Mode | Scores | Performance | Notes | | ||
| --------------------------- | -------------------------------------- | ----------- | --------------------------------------------- | | ||
| `constant_score` | Same score for all matches | Best | Default mode, ideal for filters | | ||
| `scoring_boolean` | TF/IDF-based | Moderate | Full relevance scoring | | ||
| `constant_score_boolean` | Same score but with Boolean structure | Moderate | Use with `must_not` or `minimum_should_match` | | ||
| `top_terms_N` | TF/IDF on top N terms | Efficient | Truncates expansion | | ||
| `top_terms_boost_N` | Static boosts | Fast | Less accurate | | ||
| `top_terms_blended_freqs_N` | Blended score | Balanced | Best scoring/efficiency trade-off | | ||
|
||
|
||
## Available rewrite methods | ||
|
||
The following table summarizes the available rewrite methods. | ||
|
||
| Rewrite method | Description | | ||
| [`constant_score`](#constant-score) | (Default) All expanded terms are evaluated together as a single unit, assigning the same score to every match. Matching documents are not scored individually, making it very efficient for filtering use cases. | | ||
| [`scoring_boolean`](#scoring-boolean) | Breaks the query into a Boolean `should` clause with one term query per match. Each result is scored individually based on relevance. | | ||
| [`constant_score_boolean`](#constant-score-boolean) | Similar to `scoring_boolean`, but all documents receive a fixed score regardless of term frequency. Maintains Boolean structure without TF/IDF weighting. | | ||
| [`top_terms_N`](#top-terms-n) | Restricts scoring and execution to the N most frequent terms. Reduces resource usage and prevents clause overload. | | ||
| [`top_terms_boost_N`](#top-terms-boost-n) | Like `top_terms_N` but uses static boosting instead of full scoring. Offers performance improvements with simplified relevance. | | ||
| [`top_terms_blended_freqs_N`](#top-terms-blended-frequencies-n) | Chooses the top N matching terms and averages their document frequencies for scoring. Produces balanced scores without full term explosion. | | ||
|
||
## Boolean-based rewrite limits | ||
|
||
All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`, and `top_terms_*`, are subject to the following [dynamic cluster-level index settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/#dynamic-cluster-level-index-settings): | ||
|
||
```json | ||
indices.query.bool.max_clause_count | ||
``` | ||
|
||
This setting controls the maximum number of allowed Boolean `should` clauses (default: `1024`). If your query expands to a number of terms greater than this limit, it is rejected with a `too_many_clauses` error. | ||
|
||
For example, a wildcard, such as "error*", might expand to hundreds or thousands of matching terms, which could include "error", "errors", "error_log", "error404", and others. Each of these terms turns into a separate `term` query. If the number of terms exceeds the `indices.query.bool.max_clause_count` limit, the query fails: | ||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "error*", | ||
"rewrite": "scoring_boolean" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
The query is expanded internally as follows: | ||
|
||
```json | ||
{ | ||
"bool": { | ||
"should": [ | ||
{ "term": { "message": "error" } }, | ||
{ "term": { "message": "errors" } }, | ||
{ "term": { "message": "error_log" } }, | ||
{ "term": { "message": "error404" } }, | ||
... | ||
] | ||
} | ||
} | ||
``` | ||
|
||
## Constant score | ||
|
||
The default `constant_score` rewrite method wraps all expanded terms into a single query and skips the scoring phase entirely. This approach has the following characteristics: | ||
|
||
* Executes all term matches as a single [bit array](https://en.wikipedia.org/wiki/Bit_array) query. | ||
* Ignores scoring altogether; every document gets a `_score` of `1.0`. | ||
* Fastest option; ideal for filtering. | ||
|
||
The following example runs a `wildcard` query using the default `constant_score` rewrite method to efficiently filter documents matching the pattern `warning*` in the `message` field: | ||
|
||
natebower marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Scoring Boolean | ||
|
||
The `scoring_boolean` rewrite method breaks the expanded terms into separate `term` queries combined under a Boolean `should` clause. This approach works as follows: | ||
|
||
* Expands the wildcard into individual `term` queries inside a Boolean `should` clause. | ||
* Each document's score reflects how many terms it matches and the terms' frequency. | ||
|
||
The following example uses a `scoring_boolean` rewrite configuration: | ||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "scoring_boolean" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Constant score Boolean | ||
|
||
The `constant_score_boolean` rewrite method uses the same Boolean structure as `scoring_boolean` but disables scoring, making it useful when clause logic is needed without relevance ranking. This method has the following characteristics: | ||
|
||
* Similar structure to `scoring_boolean`, but documents are not ranked. | ||
* All matching documents receive the same score. | ||
* Retains Boolean clause flexibility, such as using `must_not`, without ranking. | ||
|
||
The following example query uses a `must_not` Boolean clause: | ||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"bool": { | ||
"must_not": { | ||
"wildcard": { | ||
"message": { | ||
"value": "error*", | ||
"rewrite": "constant_score_boolean" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
This query is internally expanded as follows: | ||
|
||
```json | ||
{ | ||
"bool": { | ||
"must_not": { | ||
"bool": { | ||
"should": [ | ||
{ "term": { "message": "error" } }, | ||
{ "term": { "message": "errors" } }, | ||
{ "term": { "message": "error_log" } }, | ||
... | ||
] | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Top terms N | ||
|
||
The `top_terms_N` method is one of several rewrite options designed to balance scoring accuracy and performance when expanding multi-term queries. It works as follows: | ||
|
||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Only the N most frequently matching terms are selected and scored. | ||
* Useful when you expect a large term expansion and want to limit the load. | ||
* Other valid terms are ignored to preserve performance. | ||
|
||
The following query uses the `top_terms_2` rewrite method to score only the two most frequent terms that match the `warning*` pattern in the `message` field: | ||
|
||
```json | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "top_terms_2" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Top terms boost N | ||
|
||
The `top_terms_boost_N` rewrite method selects the top N matching terms and applies static `boost` values instead of computing full relevance scores. It works as follows: | ||
|
||
* Limits expansion to the top N terms like `top_terms_N`. | ||
* Rather than computing TF/IDF, it assigns a preset boost to each term. | ||
* Provides faster execution with predictable relevance weights. | ||
|
||
The following example uses a `top_terms_boost_2` rewrite parameter: | ||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "top_terms_boost_2" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Top terms blended frequencies N | ||
|
||
The `top_terms_blended_freqs_N` rewrite method selects the top N matching terms and blends their document frequencies to produce more balanced relevance scores. This approach has the following characteristics: | ||
|
||
* Picks the top N matching terms and applies a blended frequency to all. | ||
* Blending makes scoring smoother across terms that differ in frequency. | ||
* Good trade-off when you want performance with realistic scoring. | ||
|
||
The following example uses a `top_terms_blended_freqs_2` rewrite parameter: | ||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "top_terms_blended_freqs_2" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.