Skip to content

adding rewrite parameter docs #9954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 247 additions & 0 deletions _query-dsl/rewrite-parameter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
---
layout: default
title: Rewrite
nav_order: 80
---

# Rewrite

Multi-term queries like `wildcard`, `prefix`, `regexp`, `fuzzy`, and `range` expand internally into sets of terms. The `rewrite` parameter allows you to control how these term expansions are executed and scored.

When a multi-term query expands into many terms (for example `prefix: "error*"` matching hundreds of terms), they are converted into actual term queries internally. This process can:

* Exceed the `indices.query.bool.max_clause_count` limit (default `1024`)
* Affect how scores are calculated for matching documents
* Impact memory and latency depending on the rewrite method used

## Available rewrite methods

| Rewrite method | Description |
| [`constant_score`](#constant_score-default) | (Default) All expanded terms are evaluated together as a single unit, assigning the same score to every match, matching documents are not scored individually. Making it very efficient for filtering use cases |

Check failure on line 20 in _query-dsl/rewrite-parameter.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SpacingWords] There should be one space between words in 'very efficient'. Raw Output: {"message": "[OpenSearch.SpacingWords] There should be one space between words in 'very efficient'.", "location": {"path": "_query-dsl/rewrite-parameter.md", "range": {"start": {"line": 20, "column": 218}}}, "severity": "ERROR"}
| [`scoring_boolean`](#scoring_boolean) | Breaks the query into a Boolean `should` clause with one term query per match. Each result is scored individually based on relevance. |
| [`constant_score_boolean`](#constant_score_boolean) | Similar to `scoring_boolean`, but all documents receive a fixed score regardless of term frequency. Maintains Boolean structure without TF/IDF weighting. |
| [`top_terms_N`](#top_terms_N) | Restricts scoring and execution to the N most frequent terms. Reduces resource usage and prevents clause overload. |
| [`top_terms_boost_N`](#top_terms_boost_N) | Like `top_terms_N`, but uses static boosting instead of full scoring. Offers performance improvements with simplified relevance. |
| [`top_terms_blended_freqs_N`](#top_terms_blended_freqs_N) | Chooses the top N matching terms and averages their document frequencies for scoring. Produces balanced scores without full term explosion. |

## Boolean-based rewrite limits

All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`, and `top_terms_*`, are subject to the following configuration:

```json
indices.query.bool.max_clause_count
```

This setting controls the maximum number of allowed Boolean `should` clauses (default: `1024`). If your query expands beyond this limit, it will be rejected with a `too_many_clauses` error.

For example, a wildcard, such as "error*", might expand to hundreds or thousands of matching terms, which could include: "error", "errors", "error_log", "error404", and others. Each of these terms turns into a separate term query. If the number of terms exceed the `indices.query.bool.max_clause_count` limit, the query fails. See following example:

```json
POST /logs/_search
{
"query": {
"wildcard": {
"message": {
"value": "error*",
"rewrite": "scoring_boolean"
}
}
}
}
```
{% include copy-curl.html %}

Query is expanded internally as follows:

```json
{
"bool": {
"should": [
{ "term": { "message": "error" } },
{ "term": { "message": "errors" } },
{ "term": { "message": "error_log" } },
{ "term": { "message": "error404" } },
...
]
}
}
```

## constant_score (default)

Check failure on line 70 in _query-dsl/rewrite-parameter.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] '(default)' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] '(default)' is a heading and should be in sentence case.", "location": {"path": "_query-dsl/rewrite-parameter.md", "range": {"start": {"line": 70, "column": 19}}}, "severity": "ERROR"}

The constant_score rewrite method wraps all expanded terms into a single query and skips the scoring phase entirely. This approach offers the following characteristics:

* Executes all term matches as a single [bit array](https://en.wikipedia.org/wiki/Bit_array) query.
* Ignores scoring altogether; every document gets `_score = 1.0`.
* Fastest option; ideal when filtering is the goal.

```json
POST /logs/_search
{
"query": {
"wildcard": {
"message": {
"value": "warning*"
}
}
}
}
```
{% include copy-curl.html %}

## scoring_boolean

The `scoring_boolean` rewrite method breaks the expanded terms into separate `term` queries combined under a Boolean `should` clause. This approach works as follows:

* Expands the wildcard into individual `term` queries inside a Boolean `should` clause.
* Each document’s score reflects how many terms it matches and their term frequency.

The following example is using `scoring_boolean` rewrite configuration:

```json
POST /logs/_search
{
"query": {
"wildcard": {
"message": {
"value": "warning*",
"rewrite": "scoring_boolean"
}
}
}
}
```
{% include copy-curl.html %}

## constant_score_boolean

The `constant_score_boolean` rewrite method uses the same Boolean structure as `scoring_boolean` but disables scoring, making it useful when clause logic is needed without relevance ranking. This method offers the following characteristics:

* Similar structure to `scoring_boolean`, but documents are not ranked.
* All matching docs receive the same score.
* This retains Boolean clause flexibility, such as using `must_not`, without ranking.

See the following example query using `must_not`:

```json
POST /logs/_search
{
"query": {
"bool": {
"must_not": {
"wildcard": {
"message": {
"value": "error*",
"rewrite": "constant_score_boolean"
}
}
}
}
}
}
```
{% include copy-curl.html %}

This query is internally expanded as follows:

```json
{
"bool": {
"must_not": {
"bool": {
"should": [
{ "term": { "message": "error" } },
{ "term": { "message": "errors" } },
{ "term": { "message": "error_log" } },
...
]
}
}
}
}
```

## top_terms_N

* Only the N most frequent matching terms are selected and scored.
* Useful when you expect large expansion and want to limit load.
* Other valid terms are ignored to preserve performance.

```json
POST /logs/_search
{
"query": {
"wildcard": {
"message": {
"value": "warning*",
"rewrite": "top_terms_2"
}
}
}
}
```
{% include copy-curl.html %}

## top_terms_boost_N

The `top_terms_boost_N` rewrite method selects the top N matching terms and applies static `boost` values instead of computing full relevance scores. It works as follows:

* Limits expansion to the top N terms like `top_terms_N`.
* Rather than computing TF/IDF, it assigns a pre-set boost per term.
* Provides faster execution with predictable relevance weights.

See the following example query using `top_terms_boost_2` rewrite parameter:

```json
POST /logs/_search
{
"query": {
"wildcard": {
"message": {
"value": "warning*",
"rewrite": "top_terms_boost_2"
}
}
}
}
```
{% include copy-curl.html %}

## top_terms_blended_freqs_N

The `top_terms_blended_freqs_N` rewrite method selects the top N matching terms and blends their document frequencies to produce more balanced relevance scores. This approach offers the following characteristics:

* Picks the top N matching terms and applies a blended frequency to all.
* Blending makes scoring smoother across terms that differ in frequency.
* Good trade-off when you want performance with realistic scoring.

See the following example query using `top_terms_blended_freqs_2` rewrite parameter:

```json
POST /logs/_search
{
"query": {
"wildcard": {
"message": {
"value": "warning*",
"rewrite": "top_terms_blended_freqs_2"
}
}
}
}
```
{% include copy-curl.html %}

## Summary

The `rewrite` parameter gives you control over how multi-term queries behave under the hood.

| Mode | Scores | Performance | Notes |
| --------------------------- | -------------------------------------- | ----------- | --------------------------------------------- |
| `constant_score` | Same score for all matches | Best | Default mode, ideal for filters |
| `scoring_boolean` | TF/IDF-based | Moderate | Full relevance scoring |
| `constant_score_boolean` | Same score, but with Boolean structure | Moderate | Use with `must_not` or `minimum_should_match` |
| `top_terms_N` | TF/IDF on top N terms | Efficient | Truncates expansion |
| `top_terms_boost_N` | Static boosts | Fast | Less accurate |
| `top_terms_blended_freqs_N` | Blended score | Balanced | Best scoring/efficiency trade-off |

Loading