Skip to content

Commit 077d747

Browse files
authored
Clarify how to perform exact match searches in Quickwit (#5797)
* Add test using range to emulate keyword query * Add docs * Fix prefix test case
1 parent 432d73f commit 077d747

File tree

4 files changed

+106
-2
lines changed

4 files changed

+106
-2
lines changed

docs/configuration/index-config.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ fast:
141141

142142
| Tokenizer | Description |
143143
| ------------- | ------------- |
144-
| `raw` | Does not process nor tokenize text. Filters out tokens larger than 255 bytes. |
144+
| `raw` | Does not process nor tokenize text. Filters out tokens larger than 255 bytes. This is similar to the `keyword` type in Elasticsearch. |
145145
| `raw_lowercase` | Does not tokenize text, but lowercase it. Filters out tokens larger than 255 bytes. |
146146
| `default` | Chops the text on according to whitespace and punctuation, removes tokens that are too long, and converts to lowercase. Filters out tokens larger than 255 bytes. |
147147
| `en_stem` | Like `default`, but also applies stemming on the resulting tokens. Filters out tokens larger than 255 bytes. |

docs/reference/es_compatible_api.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -672,6 +672,12 @@ Moreover, while Quickwit does not support `best_fields` or `cross_fields`, it wi
672672

673673
[Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-term-query.html)
674674

675+
:::note
676+
677+
When working on text, it is recommended to only use `term` queries on fields configured with `tokenizer: raw`. This is the Quickwit equivalent of the Elasticsearch `keyword` type.
678+
679+
:::
680+
675681
#### Example
676682

677683
```json
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
## using an index (with the raw tokenizer)
2+
endpoint: nested/search
3+
method: POST
4+
json:
5+
query: "text_raw:indexed-with-raw-tokenizer-dashes"
6+
expected:
7+
num_hits: 1
8+
---
9+
endpoint: nested/search
10+
method: POST
11+
json:
12+
query: "text_raw:indexed_with_raw_tokenizer_dashes"
13+
expected:
14+
num_hits: 0
15+
---
16+
endpoint: nested/search
17+
method: POST
18+
json:
19+
query: "text_raw:indexed-with-raw"
20+
expected:
21+
num_hits: 0
22+
---
23+
endpoint: nested/search
24+
method: POST
25+
json:
26+
query: 'text_raw:"indexed with raw tokenizer dashes"'
27+
expected:
28+
num_hits: 1
29+
---
30+
endpoint: nested/search
31+
method: POST
32+
json:
33+
query: 'text_raw:"indexed with raw"'
34+
expected:
35+
num_hits: 0
36+
---
37+
## using a fast field (use a range query to force using the fast field)
38+
endpoint: nested/search
39+
method: POST
40+
json:
41+
query: "text_fast:fast-text-value-dashes"
42+
status_code: 400
43+
expected:
44+
message: "invalid query: query is incompatible with schema. field text_fast is not full-text searchable)"
45+
---
46+
endpoint: nested/search
47+
method: POST
48+
json:
49+
query: "text_fast:[fast-text-value-dashes TO fast-text-value-dashes]"
50+
expected:
51+
num_hits: 1
52+
---
53+
endpoint: nested/search
54+
method: POST
55+
json:
56+
query: "text_fast:[fast_text_value_dashes TO fast_text_value_dashes]"
57+
expected:
58+
num_hits: 0
59+
---
60+
endpoint: nested/search
61+
method: POST
62+
json:
63+
query: "text_fast:[fast-text-value TO fast-text-value]"
64+
expected:
65+
num_hits: 0
66+
---
67+
# unfortunately, the query parser does not support escaping whitespaces
68+
# use the Elasticsearch API instead
69+
endpoint: nested/search
70+
method: POST
71+
json:
72+
query: 'text_fast:["fast text value whitespaces" TO "fast text value whitespacesd"]'
73+
status_code: 400
74+
---
75+
endpoint: nested/search
76+
method: POST
77+
json:
78+
query: "text_fast:[fast text value whitespaces TO fast text value whitespaces]"
79+
status_code: 400
80+
---
81+
endpoint: nested/search
82+
method: POST
83+
json:
84+
query: "text_fast:[fast\ text\ value\ whitespaces TO fast\ text\ value\ whitespaces]"
85+
status_code: 400

quickwit/rest-api-tests/scenarii/qw_search_api/_setup.quickwit.yaml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,16 @@ json:
7272
- name: object_fast_field
7373
type: u64
7474
fast: true
75-
75+
- name: text_fast
76+
type: text
77+
fast: true
78+
indexed: false
79+
- name: text_raw
80+
type: text
81+
fast: false
82+
indexed: true
83+
tokenizer: raw
84+
7685
---
7786
method: POST
7887
endpoint: nested/ingest
@@ -85,3 +94,7 @@ ndjson:
8594
- {"object_multi": {"object_text_field": "multi hello"}}
8695
- {"object_multi": {"object_fast_field": 1}}
8796
- {"object_multi": {"object_fast_field": 2}}
97+
- {"text_raw": "indexed-with-raw-tokenizer-dashes"}
98+
- {"text_raw": "indexed with raw tokenizer dashes"}
99+
- {"text_fast": "fast-text-value-dashes"}
100+
- {"text_fast": "fast text value whitespaces"}

0 commit comments

Comments
 (0)