|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: Rewrite |
| 4 | +nav_order: 80 |
| 5 | +--- |
| 6 | + |
| 7 | +# Rewrite |
| 8 | + |
| 9 | +Multi-term queries like `wildcard`, `prefix`, `regexp`, `fuzzy`, and `range` expand internally into sets of terms. The `rewrite` parameter allows you to control how these term expansions are executed and scored. |
| 10 | + |
| 11 | +When a multi-term query expands into many terms (for example, `prefix: "error*"` matching hundreds of terms), they are internally converted into `term` queries. This process can have the following drawbacks: |
| 12 | + |
| 13 | +* Exceed the `indices.query.bool.max_clause_count` limit (default is `1024`). |
| 14 | +* Affect how scores are calculated for matching documents. |
| 15 | +* Impact memory and latency depending on the rewrite method used. |
| 16 | + |
| 17 | +The `rewrite` parameter gives you control over how multi-term queries behave internally. |
| 18 | + |
| 19 | +| Mode | Scores | Performance | Notes | |
| 20 | +| --------------------------- | -------------------------------------- | ----------- | --------------------------------------------- | |
| 21 | +| `constant_score` | Same score for all matches | Best | Default mode, ideal for filters | |
| 22 | +| `scoring_boolean` | TF/IDF-based | Moderate | Full relevance scoring | |
| 23 | +| `constant_score_boolean` | Same score but with Boolean structure | Moderate | Use with `must_not` or `minimum_should_match` | |
| 24 | +| `top_terms_N` | TF/IDF on top N terms | Efficient | Truncates expansion | |
| 25 | +| `top_terms_boost_N` | Static boosts | Fast | Less accurate | |
| 26 | +| `top_terms_blended_freqs_N` | Blended score | Balanced | Best scoring/efficiency trade-off | |
| 27 | + |
| 28 | + |
| 29 | +## Available rewrite methods |
| 30 | + |
| 31 | +The following table summarizes the available rewrite methods. |
| 32 | + |
| 33 | +| Rewrite method | Description | |
| 34 | +| [`constant_score`](#constant-score) | (Default) All expanded terms are evaluated together as a single unit, assigning the same score to every match. Matching documents are not scored individually, making it very efficient for filtering use cases. | |
| 35 | +| [`scoring_boolean`](#scoring-boolean) | Breaks the query into a Boolean `should` clause with one term query per match. Each result is scored individually based on relevance. | |
| 36 | +| [`constant_score_boolean`](#constant-score-boolean) | Similar to `scoring_boolean`, but all documents receive a fixed score regardless of term frequency. Maintains Boolean structure without TF/IDF weighting. | |
| 37 | +| [`top_terms_N`](#top-terms-n) | Restricts scoring and execution to the N most frequent terms. Reduces resource usage and prevents clause overload. | |
| 38 | +| [`top_terms_boost_N`](#top-terms-boost-n) | Like `top_terms_N` but uses static boosting instead of full scoring. Offers performance improvements with simplified relevance. | |
| 39 | +| [`top_terms_blended_freqs_N`](#top-terms-blended-frequencies-n) | Chooses the top N matching terms and averages their document frequencies for scoring. Produces balanced scores without full term explosion. | |
| 40 | + |
| 41 | +## Boolean-based rewrite limits |
| 42 | + |
| 43 | +All Boolean-based rewrites, such as `scoring_boolean`, `constant_score_boolean`, and `top_terms_*`, are subject to the following [dynamic cluster-level index settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/#dynamic-cluster-level-index-settings): |
| 44 | + |
| 45 | +```json |
| 46 | +indices.query.bool.max_clause_count |
| 47 | +``` |
| 48 | + |
| 49 | +This setting controls the maximum number of allowed Boolean `should` clauses (default: `1024`). If your query expands to a number of terms greater than this limit, it is rejected with a `too_many_clauses` error. |
| 50 | + |
| 51 | +For example, a wildcard, such as "error*", might expand to hundreds or thousands of matching terms, which could include "error", "errors", "error_log", "error404", and others. Each of these terms turns into a separate `term` query. If the number of terms exceeds the `indices.query.bool.max_clause_count` limit, the query fails: |
| 52 | + |
| 53 | +```json |
| 54 | +POST /logs/_search |
| 55 | +{ |
| 56 | + "query": { |
| 57 | + "wildcard": { |
| 58 | + "message": { |
| 59 | + "value": "error*", |
| 60 | + "rewrite": "scoring_boolean" |
| 61 | + } |
| 62 | + } |
| 63 | + } |
| 64 | +} |
| 65 | +``` |
| 66 | +{% include copy-curl.html %} |
| 67 | + |
| 68 | +The query is expanded internally as follows: |
| 69 | + |
| 70 | +```json |
| 71 | +{ |
| 72 | + "bool": { |
| 73 | + "should": [ |
| 74 | + { "term": { "message": "error" } }, |
| 75 | + { "term": { "message": "errors" } }, |
| 76 | + { "term": { "message": "error_log" } }, |
| 77 | + { "term": { "message": "error404" } }, |
| 78 | + ... |
| 79 | + ] |
| 80 | + } |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +## Constant score |
| 85 | + |
| 86 | +The default `constant_score` rewrite method wraps all expanded terms into a single query and skips the scoring phase entirely. This approach has the following characteristics: |
| 87 | + |
| 88 | +* Executes all term matches as a single [bit array](https://en.wikipedia.org/wiki/Bit_array) query. |
| 89 | +* Ignores scoring altogether; every document gets a `_score` of `1.0`. |
| 90 | +* Fastest option; ideal for filtering. |
| 91 | + |
| 92 | +The following example runs a `wildcard` query using the default `constant_score` rewrite method to efficiently filter documents matching the pattern `warning*` in the `message` field: |
| 93 | + |
| 94 | +```json |
| 95 | +POST /logs/_search |
| 96 | +{ |
| 97 | + "query": { |
| 98 | + "wildcard": { |
| 99 | + "message": { |
| 100 | + "value": "warning*" |
| 101 | + } |
| 102 | + } |
| 103 | + } |
| 104 | +} |
| 105 | +``` |
| 106 | +{% include copy-curl.html %} |
| 107 | + |
| 108 | +## Scoring Boolean |
| 109 | + |
| 110 | +The `scoring_boolean` rewrite method breaks the expanded terms into separate `term` queries combined under a Boolean `should` clause. This approach works as follows: |
| 111 | + |
| 112 | +* Expands the wildcard into individual `term` queries inside a Boolean `should` clause. |
| 113 | +* Each document's score reflects how many terms it matches and the terms' frequency. |
| 114 | + |
| 115 | +The following example uses a `scoring_boolean` rewrite configuration: |
| 116 | + |
| 117 | +```json |
| 118 | +POST /logs/_search |
| 119 | +{ |
| 120 | + "query": { |
| 121 | + "wildcard": { |
| 122 | + "message": { |
| 123 | + "value": "warning*", |
| 124 | + "rewrite": "scoring_boolean" |
| 125 | + } |
| 126 | + } |
| 127 | + } |
| 128 | +} |
| 129 | +``` |
| 130 | +{% include copy-curl.html %} |
| 131 | + |
| 132 | +## Constant score Boolean |
| 133 | + |
| 134 | +The `constant_score_boolean` rewrite method uses the same Boolean structure as `scoring_boolean` but disables scoring, making it useful when clause logic is needed without relevance ranking. This method has the following characteristics: |
| 135 | + |
| 136 | +* Similar structure to `scoring_boolean`, but documents are not ranked. |
| 137 | +* All matching documents receive the same score. |
| 138 | +* Retains Boolean clause flexibility, such as using `must_not`, without ranking. |
| 139 | + |
| 140 | +The following example query uses a `must_not` Boolean clause: |
| 141 | + |
| 142 | +```json |
| 143 | +POST /logs/_search |
| 144 | +{ |
| 145 | + "query": { |
| 146 | + "bool": { |
| 147 | + "must_not": { |
| 148 | + "wildcard": { |
| 149 | + "message": { |
| 150 | + "value": "error*", |
| 151 | + "rewrite": "constant_score_boolean" |
| 152 | + } |
| 153 | + } |
| 154 | + } |
| 155 | + } |
| 156 | + } |
| 157 | +} |
| 158 | +``` |
| 159 | +{% include copy-curl.html %} |
| 160 | + |
| 161 | +This query is internally expanded as follows: |
| 162 | + |
| 163 | +```json |
| 164 | +{ |
| 165 | + "bool": { |
| 166 | + "must_not": { |
| 167 | + "bool": { |
| 168 | + "should": [ |
| 169 | + { "term": { "message": "error" } }, |
| 170 | + { "term": { "message": "errors" } }, |
| 171 | + { "term": { "message": "error_log" } }, |
| 172 | + ... |
| 173 | + ] |
| 174 | + } |
| 175 | + } |
| 176 | + } |
| 177 | +} |
| 178 | +``` |
| 179 | + |
| 180 | +## Top terms N |
| 181 | + |
| 182 | +The `top_terms_N` method is one of several rewrite options designed to balance scoring accuracy and performance when expanding multi-term queries. It works as follows: |
| 183 | + |
| 184 | +* Only the N most frequently matching terms are selected and scored. |
| 185 | +* Useful when you expect a large term expansion and want to limit the load. |
| 186 | +* Other valid terms are ignored to preserve performance. |
| 187 | + |
| 188 | +The following query uses the `top_terms_2` rewrite method to score only the two most frequent terms that match the `warning*` pattern in the `message` field: |
| 189 | + |
| 190 | +```json |
| 191 | +POST /logs/_search |
| 192 | +{ |
| 193 | + "query": { |
| 194 | + "wildcard": { |
| 195 | + "message": { |
| 196 | + "value": "warning*", |
| 197 | + "rewrite": "top_terms_2" |
| 198 | + } |
| 199 | + } |
| 200 | + } |
| 201 | +} |
| 202 | +``` |
| 203 | +{% include copy-curl.html %} |
| 204 | + |
| 205 | +## Top terms boost N |
| 206 | + |
| 207 | +The `top_terms_boost_N` rewrite method selects the top N matching terms and applies static `boost` values instead of computing full relevance scores. It works as follows: |
| 208 | + |
| 209 | +* Limits expansion to the top N terms like `top_terms_N`. |
| 210 | +* Rather than computing TF/IDF, it assigns a preset boost to each term. |
| 211 | +* Provides faster execution with predictable relevance weights. |
| 212 | + |
| 213 | +The following example uses a `top_terms_boost_2` rewrite parameter: |
| 214 | + |
| 215 | +```json |
| 216 | +POST /logs/_search |
| 217 | +{ |
| 218 | + "query": { |
| 219 | + "wildcard": { |
| 220 | + "message": { |
| 221 | + "value": "warning*", |
| 222 | + "rewrite": "top_terms_boost_2" |
| 223 | + } |
| 224 | + } |
| 225 | + } |
| 226 | +} |
| 227 | +``` |
| 228 | +{% include copy-curl.html %} |
| 229 | + |
| 230 | +## Top terms blended frequencies N |
| 231 | + |
| 232 | +The `top_terms_blended_freqs_N` rewrite method selects the top N matching terms and blends their document frequencies to produce more balanced relevance scores. This approach has the following characteristics: |
| 233 | + |
| 234 | +* Picks the top N matching terms and applies a blended frequency to all. |
| 235 | +* Blending makes scoring smoother across terms that differ in frequency. |
| 236 | +* Good trade-off when you want performance with realistic scoring. |
| 237 | + |
| 238 | +The following example uses a `top_terms_blended_freqs_2` rewrite parameter: |
| 239 | + |
| 240 | +```json |
| 241 | +POST /logs/_search |
| 242 | +{ |
| 243 | + "query": { |
| 244 | + "wildcard": { |
| 245 | + "message": { |
| 246 | + "value": "warning*", |
| 247 | + "rewrite": "top_terms_blended_freqs_2" |
| 248 | + } |
| 249 | + } |
| 250 | + } |
| 251 | +} |
| 252 | +``` |
| 253 | +{% include copy-curl.html %} |
0 commit comments