From a99b93a59e9c29a3972e9bc7f985fe19d3d36f59 Mon Sep 17 00:00:00 2001
From: Anton Rubin <anton.rubin@eliatra.com>
Date: Wed, 21 May 2025 16:07:42 +0100
Subject: [PATCH 1/3] adding rewrite parameter docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
---
 _query-dsl/rewrite-parameter.md | 174 ++++++++++++++++++++++++++++++++
 1 file changed, 174 insertions(+)
 create mode 100644 _query-dsl/rewrite-parameter.md

diff --git a/_query-dsl/rewrite-parameter.md b/_query-dsl/rewrite-parameter.md
new file mode 100644
index 00000000000..abc7d50b78c
--- /dev/null
+++ b/_query-dsl/rewrite-parameter.md
@@ -0,0 +1,174 @@
+---
+layout: default
+title: Rewrite
+nav_order: 80
+---
+
+# Rewrite
+
+Multi-term queries like `wildcard`, `prefix`, `regexp`, `fuzzy`, and `range` expand internally into sets of terms. The `rewrite` parameter allows you to control how these term expansions are executed and scored.
+
+When a multi-term query expands into many terms (for example `prefix: "error*"` matching hundreds of terms), they are converted into actual term queries internally. This process can:
+
+* Exceed the `indices.query.bool.max_clause_count` limit (default `1024`)
+* Affect how scores are calculated for matching documents
+* Impact memory and latency depending on the rewrite method used
+
+## Available rewrite methods
+
+| Rewrite method | Description |
+| [`constant_score`](#constant_score-default) | (Default) All expanded terms are evaluated together as a single unit, assigning the same score to every match. Efficient for filtering use cases. |
+| [`scoring_boolean`](#scoring_boolean) | Breaks the query into a Boolean `should` clause with one term query per match. Each result is scored individually based on relevance. |
+| [`constant_score_boolean`](#constant_score_boolean) | Similar to `scoring_boolean`, but all documents receive a fixed score regardless of term frequency. Maintains Boolean structure without TF/IDF weighting. |
+| [`top_terms_N`](#top_terms_N) | Restricts scoring and execution to the N most frequent terms. Reduces resource usage and prevents clause overload. |
+| [`top_terms_boost_N`](#top_terms_boost_N) | Like `top_terms_N`, but uses static boosting instead of full scoring. Offers performance improvements with simplified relevance. |
+| [`top_terms_blended_freqs_N`](#top_terms_blended_freqs_N) | Chooses the top N matching terms and averages their document frequencies for scoring. Produces balanced scores without full term explosion. |
+
+## Boolean-based rewrite limits
+
+All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`, and `top_terms_*`, are subject to:
+
+```json
+indices.query.bool.max_clause_count
+```
+
+This setting controls the maximum number of allowed Boolean `should` clauses (default: 1024). If your query expands beyond this limit, it will be rejected with a `too_many_clauses` error.
+
+## constant_score (default)
+
+* Executes all term matches as a single bitset query.
+* Ignores scoring altogether; every document gets `_score = 1.0`.
+* Fastest option; ideal when filtering is the goal.
+
+```json
+POST /logs/_search
+{
+  "query": {
+    "wildcard": {
+      "message": {
+        "value": "warning*"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+## scoring_boolean
+
+* Expands the wildcard into individual `term` queries inside a Boolean `should` clause.
+* Each document’s score reflects how many terms it matches and their term frequency.
+* Can trigger `too_many_clauses` if many terms match.
+
+```json
+POST /logs/_search
+{
+  "query": {
+    "wildcard": {
+      "message": {
+        "value": "warning*",
+        "rewrite": "scoring_boolean"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+## constant_score_boolean
+
+* Similar structure to `scoring_boolean`, but documents are not ranked.
+* All matching docs receive the same score.
+* This retains Boolean clause flexibility (e.g., use with `must_not`) without ranking.
+
+```json
+POST /logs/_search
+{
+  "query": {
+    "wildcard": {
+      "message": {
+        "value": "warning*",
+        "rewrite": "constant_score_boolean"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+## top_terms_N
+
+* Only the N most frequent matching terms are selected and scored.
+* Useful when you expect large expansion and want to limit load.
+* Other valid terms are ignored to preserve performance.
+
+```json
+POST /logs/_search
+{
+  "query": {
+    "wildcard": {
+      "message": {
+        "value": "warning*",
+        "rewrite": "top_terms_2"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+## top_terms_boost_N
+
+* Limits expansion to the top N terms like `top_terms_N`.
+* Rather than computing TF/IDF, it assigns a pre-set boost per term.
+* Provides faster execution with predictable relevance weights.
+
+```json
+POST /logs/_search
+{
+  "query": {
+    "wildcard": {
+      "message": {
+        "value": "warning*",
+        "rewrite": "top_terms_boost_2"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+## top_terms_blended_freqs_N
+
+* Picks the top N matching terms and applies a blended frequency to all.
+* Blending makes scoring smoother across terms that differ in frequency.
+* Good tradeoff when you want performance with realistic scoring.
+
+```json
+POST /logs/_search
+{
+  "query": {
+    "wildcard": {
+      "message": {
+        "value": "warning*",
+        "rewrite": "top_terms_blended_freqs_2"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+## Summary
+
+The `rewrite` parameter gives you control over how multi-term queries behave under the hood.
+
+| Mode                        | Scores                                 | Performance | Notes                                         |
+| --------------------------- | -------------------------------------- | ----------- | --------------------------------------------- |
+| `constant_score`            | Same score for all matches             | Best        | Default mode, ideal for filters               |
+| `scoring_boolean`           | TF/IDF-based                           | Moderate    | Full relevance scoring                        |
+| `constant_score_boolean`    | Same score, but with Boolean structure | Moderate    | Use with `must_not` or `minimum_should_match` |
+| `top_terms_N`               | TF/IDF on top N terms                  | Efficient   | Truncates expansion                           |
+| `top_terms_boost_N`         | Static boosts                          | Fast        | Less accurate                                 |
+| `top_terms_blended_freqs_N` | Blended score                          | Balanced    | Best scoring/efficiency tradeoff              |
+

From d2cb7f1d9f273f525fae16cfb451d700bdb991db Mon Sep 17 00:00:00 2001
From: Anton Rubin <anton.rubin@eliatra.com>
Date: Tue, 8 Jul 2025 10:36:45 +0100
Subject: [PATCH 2/3] addressing the PR comments

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
---
 _query-dsl/rewrite-parameter.md | 87 ++++++++++++++++++++++++++++-----
 1 file changed, 76 insertions(+), 11 deletions(-)

diff --git a/_query-dsl/rewrite-parameter.md b/_query-dsl/rewrite-parameter.md
index abc7d50b78c..e3d63995c51 100644
--- a/_query-dsl/rewrite-parameter.md
+++ b/_query-dsl/rewrite-parameter.md
@@ -17,7 +17,7 @@ When a multi-term query expands into many terms (for example `prefix: "error*"`
 ## Available rewrite methods
 
 | Rewrite method | Description |
-| [`constant_score`](#constant_score-default) | (Default) All expanded terms are evaluated together as a single unit, assigning the same score to every match. Efficient for filtering use cases. |
+| [`constant_score`](#constant_score-default) | (Default) All expanded terms are evaluated together as a single unit, assigning the same score to every match, matching documents are not scored individually. Making it very  efficient for filtering use cases |
 | [`scoring_boolean`](#scoring_boolean) | Breaks the query into a Boolean `should` clause with one term query per match. Each result is scored individually based on relevance. |
 | [`constant_score_boolean`](#constant_score_boolean) | Similar to `scoring_boolean`, but all documents receive a fixed score regardless of term frequency. Maintains Boolean structure without TF/IDF weighting. |
 | [`top_terms_N`](#top_terms_N) | Restricts scoring and execution to the N most frequent terms. Reduces resource usage and prevents clause overload. |
@@ -32,11 +32,46 @@ All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`,
 indices.query.bool.max_clause_count
 ```
 
-This setting controls the maximum number of allowed Boolean `should` clauses (default: 1024). If your query expands beyond this limit, it will be rejected with a `too_many_clauses` error.
+This setting controls the maximum number of allowed Boolean `should` clauses (default: `1024`). If your query expands beyond this limit, it will be rejected with a `too_many_clauses` error. 
+
+For example, a wildcard like "error*" might expand to hundreds or thousands of matching terms, such as: "error", "errors", "error_log", "error404", and others. Each of these terms turns into a separate term query. If the number of terms exceeds the `indices.query.bool.max_clause_count` limit, the query fails with an error. See following example:
+
+```json
+POST /logs/_search
+{
+  "query": {
+    "wildcard": {
+      "message": {
+        "value": "error*",
+        "rewrite": "scoring_boolean"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+Query is expanded internally as follows:
+
+```json
+{
+  "bool": {
+    "should": [
+      { "term": { "message": "error" } },
+      { "term": { "message": "errors" } },
+      { "term": { "message": "error_log" } },
+      { "term": { "message": "error404" } },
+      ...
+    ]
+  }
+}
+```
 
 ## constant_score (default)
 
-* Executes all term matches as a single bitset query.
+The constant_score rewrite method wraps all expanded terms into a single query and skips the scoring phase entirely. This approach offers the following characteristics:
+
+* Executes all term matches as a single [bit array](https://en.wikipedia.org/wiki/Bit_array) query.
 * Ignores scoring altogether; every document gets `_score = 1.0`.
 * Fastest option; ideal when filtering is the goal.
 
@@ -56,9 +91,12 @@ POST /logs/_search
 
 ## scoring_boolean
 
+The `scoring_boolean` rewrite method breaks the expanded terms into separate `term` queries combined under a Boolean `should` clause. This approach works as follows:
+
 * Expands the wildcard into individual `term` queries inside a Boolean `should` clause.
 * Each document’s score reflects how many terms it matches and their term frequency.
-* Can trigger `too_many_clauses` if many terms match.
+
+The following example is using `scoring_boolean` rewrite configuration:
 
 ```json
 POST /logs/_search
@@ -77,18 +115,26 @@ POST /logs/_search
 
 ## constant_score_boolean
 
+The `constant_score_boolean` rewrite method uses the same Boolean structure as `scoring_boolean` but disables scoring, making it useful when clause logic is needed without relevance ranking. This method offers the following characteristics:
+
 * Similar structure to `scoring_boolean`, but documents are not ranked.
 * All matching docs receive the same score.
-* This retains Boolean clause flexibility (e.g., use with `must_not`) without ranking.
+* This retains Boolean clause flexibility, such as using `must_not`, without ranking.
+
+See the following example query using `must_not`:
 
 ```json
 POST /logs/_search
 {
   "query": {
-    "wildcard": {
-      "message": {
-        "value": "warning*",
-        "rewrite": "constant_score_boolean"
+    "bool": {
+      "must_not": {
+        "wildcard": {
+          "message": {
+            "value": "error*",
+            "rewrite": "constant_score_boolean"
+          }
+        }
       }
     }
   }
@@ -96,6 +142,25 @@ POST /logs/_search
 ```
 {% include copy-curl.html %}
 
+This query is internally expanded as follows:
+
+```json
+{
+  "bool": {
+    "must_not": {
+      "bool": {
+        "should": [
+          { "term": { "message": "error" } },
+          { "term": { "message": "errors" } },
+          { "term": { "message": "error_log" } },
+          ...
+        ]
+      }
+    }
+  }
+}
+```
+
 ## top_terms_N
 
 * Only the N most frequent matching terms are selected and scored.
@@ -142,7 +207,7 @@ POST /logs/_search
 
 * Picks the top N matching terms and applies a blended frequency to all.
 * Blending makes scoring smoother across terms that differ in frequency.
-* Good tradeoff when you want performance with realistic scoring.
+* Good trade-off when you want performance with realistic scoring.
 
 ```json
 POST /logs/_search
@@ -170,5 +235,5 @@ The `rewrite` parameter gives you control over how multi-term queries behave und
 | `constant_score_boolean`    | Same score, but with Boolean structure | Moderate    | Use with `must_not` or `minimum_should_match` |
 | `top_terms_N`               | TF/IDF on top N terms                  | Efficient   | Truncates expansion                           |
 | `top_terms_boost_N`         | Static boosts                          | Fast        | Less accurate                                 |
-| `top_terms_blended_freqs_N` | Blended score                          | Balanced    | Best scoring/efficiency tradeoff              |
+| `top_terms_blended_freqs_N` | Blended score                          | Balanced    | Best scoring/efficiency trade-off              |
 

From 266c9f9c433ff75dea3ce9cdb02ed1e745efe98c Mon Sep 17 00:00:00 2001
From: Anton Rubin <anton.rubin@eliatra.com>
Date: Tue, 8 Jul 2025 11:27:01 +0100
Subject: [PATCH 3/3] addressing comments

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
---
 _query-dsl/rewrite-parameter.md | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/_query-dsl/rewrite-parameter.md b/_query-dsl/rewrite-parameter.md
index e3d63995c51..ec4165ff658 100644
--- a/_query-dsl/rewrite-parameter.md
+++ b/_query-dsl/rewrite-parameter.md
@@ -26,7 +26,7 @@ When a multi-term query expands into many terms (for example `prefix: "error*"`
 
 ## Boolean-based rewrite limits
 
-All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`, and `top_terms_*`, are subject to:
+All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`, and `top_terms_*`, are subject to the following configuration:
 
 ```json
 indices.query.bool.max_clause_count
@@ -34,7 +34,7 @@ indices.query.bool.max_clause_count
 
 This setting controls the maximum number of allowed Boolean `should` clauses (default: `1024`). If your query expands beyond this limit, it will be rejected with a `too_many_clauses` error. 
 
-For example, a wildcard like "error*" might expand to hundreds or thousands of matching terms, such as: "error", "errors", "error_log", "error404", and others. Each of these terms turns into a separate term query. If the number of terms exceeds the `indices.query.bool.max_clause_count` limit, the query fails with an error. See following example:
+For example, a wildcard, such as "error*", might expand to hundreds or thousands of matching terms, which could include: "error", "errors", "error_log", "error404", and others. Each of these terms turns into a separate term query. If the number of terms exceed the `indices.query.bool.max_clause_count` limit, the query fails. See following example:
 
 ```json
 POST /logs/_search
@@ -184,10 +184,14 @@ POST /logs/_search
 
 ## top_terms_boost_N
 
+The `top_terms_boost_N` rewrite method selects the top N matching terms and applies static `boost` values instead of computing full relevance scores. It works as follows:
+
 * Limits expansion to the top N terms like `top_terms_N`.
 * Rather than computing TF/IDF, it assigns a pre-set boost per term.
 * Provides faster execution with predictable relevance weights.
 
+See the following example query using `top_terms_boost_2` rewrite parameter:
+
 ```json
 POST /logs/_search
 {
@@ -205,10 +209,14 @@ POST /logs/_search
 
 ## top_terms_blended_freqs_N
 
+The `top_terms_blended_freqs_N` rewrite method selects the top N matching terms and blends their document frequencies to produce more balanced relevance scores. This approach offers the following characteristics:
+
 * Picks the top N matching terms and applies a blended frequency to all.
 * Blending makes scoring smoother across terms that differ in frequency.
 * Good trade-off when you want performance with realistic scoring.
 
+See the following example query using `top_terms_blended_freqs_2` rewrite parameter:
+
 ```json
 POST /logs/_search
 {