-
Notifications
You must be signed in to change notification settings - Fork 578
adding rewrite parameter docs #9954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
a99b93a
d2cb7f1
266c9f9
3ada38c
2cb2162
c980798
71bfa3e
60878b3
b9fb6e7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
--- | ||
layout: default | ||
title: Rewrite | ||
nav_order: 80 | ||
--- | ||
|
||
# Rewrite | ||
|
||
Multi-term queries like `wildcard`, `prefix`, `regexp`, `fuzzy`, and `range` expand internally into sets of terms. The `rewrite` parameter allows you to control how these term expansions are executed and scored. | ||
|
||
When a multi-term query expands into many terms (for example `prefix: "error*"` matching hundreds of terms), they are converted into actual term queries internally. This process can: | ||
|
||
* Exceed the `indices.query.bool.max_clause_count` limit (default `1024`) | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Affect how scores are calculated for matching documents | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Impact memory and latency depending on the rewrite method used | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Available rewrite methods | ||
|
||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Rewrite method | Description | | ||
| [`constant_score`](#constant_score-default) | (Default) All expanded terms are evaluated together as a single unit, assigning the same score to every match. Efficient for filtering use cases. | | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| [`scoring_boolean`](#scoring_boolean) | Breaks the query into a Boolean `should` clause with one term query per match. Each result is scored individually based on relevance. | | ||
| [`constant_score_boolean`](#constant_score_boolean) | Similar to `scoring_boolean`, but all documents receive a fixed score regardless of term frequency. Maintains Boolean structure without TF/IDF weighting. | | ||
| [`top_terms_N`](#top_terms_N) | Restricts scoring and execution to the N most frequent terms. Reduces resource usage and prevents clause overload. | | ||
| [`top_terms_boost_N`](#top_terms_boost_N) | Like `top_terms_N`, but uses static boosting instead of full scoring. Offers performance improvements with simplified relevance. | | ||
| [`top_terms_blended_freqs_N`](#top_terms_blended_freqs_N) | Chooses the top N matching terms and averages their document frequencies for scoring. Produces balanced scores without full term explosion. | | ||
|
||
## Boolean-based rewrite limits | ||
|
||
All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`, and `top_terms_*`, are subject to: | ||
|
||
```json | ||
indices.query.bool.max_clause_count | ||
``` | ||
|
||
This setting controls the maximum number of allowed Boolean `should` clauses (default: 1024). If your query expands beyond this limit, it will be rejected with a `too_many_clauses` error. | ||
|
||
## constant_score (default) | ||
Check failure on line 37 in _query-dsl/rewrite-parameter.md
|
||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* Executes all term matches as a single bitset query. | ||
Check failure on line 39 in _query-dsl/rewrite-parameter.md
|
||
* Ignores scoring altogether; every document gets `_score = 1.0`. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Fastest option; ideal when filtering is the goal. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
natebower marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## scoring_boolean | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* Expands the wildcard into individual `term` queries inside a Boolean `should` clause. | ||
* Each document’s score reflects how many terms it matches and their term frequency. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Can trigger `too_many_clauses` if many terms match. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "scoring_boolean" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## constant_score_boolean | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* Similar structure to `scoring_boolean`, but documents are not ranked. | ||
* All matching docs receive the same score. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* This retains Boolean clause flexibility (e.g., use with `must_not`) without ranking. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "constant_score_boolean" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## top_terms_N | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Only the N most frequent matching terms are selected and scored. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Useful when you expect large expansion and want to limit load. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Other valid terms are ignored to preserve performance. | ||
|
||
```json | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "top_terms_2" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## top_terms_boost_N | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* Limits expansion to the top N terms like `top_terms_N`. | ||
* Rather than computing TF/IDF, it assigns a pre-set boost per term. | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Provides faster execution with predictable relevance weights. | ||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "top_terms_boost_2" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## top_terms_blended_freqs_N | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* Picks the top N matching terms and applies a blended frequency to all. | ||
* Blending makes scoring smoother across terms that differ in frequency. | ||
* Good tradeoff when you want performance with realistic scoring. | ||
Check failure on line 145 in _query-dsl/rewrite-parameter.md
|
||
|
||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"wildcard": { | ||
"message": { | ||
"value": "warning*", | ||
"rewrite": "top_terms_blended_freqs_2" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Summary | ||
AntonEliatra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The `rewrite` parameter gives you control over how multi-term queries behave under the hood. | ||
|
||
| Mode | Scores | Performance | Notes | | ||
| --------------------------- | -------------------------------------- | ----------- | --------------------------------------------- | | ||
| `constant_score` | Same score for all matches | Best | Default mode, ideal for filters | | ||
| `scoring_boolean` | TF/IDF-based | Moderate | Full relevance scoring | | ||
| `constant_score_boolean` | Same score, but with Boolean structure | Moderate | Use with `must_not` or `minimum_should_match` | | ||
| `top_terms_N` | TF/IDF on top N terms | Efficient | Truncates expansion | | ||
| `top_terms_boost_N` | Static boosts | Fast | Less accurate | | ||
| `top_terms_blended_freqs_N` | Blended score | Balanced | Best scoring/efficiency tradeoff | | ||
Check failure on line 173 in _query-dsl/rewrite-parameter.md
|
||
|
Uh oh!
There was an error while loading. Please reload this page.