Skip to content

Commit 1278ccc

Browse files
DEV: add query performance tuning page (#1052)
* DEV: add query performance tuning page * Apply review comments. Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> * Apply review comments. * Apply more review comments. --------- Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com>
1 parent 8dc9aaf commit 1278ccc

File tree

2 files changed

+161
-0
lines changed

2 files changed

+161
-0
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
categories:
3+
- docs
4+
- develop
5+
- stack
6+
- oss
7+
description: Redis Query Engine best practices
8+
linkTitle: Best practices
9+
title: Best practices
10+
weight: 8
11+
---
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
---
2+
Title: Best practices for Redis Query Engine performance
3+
alwaysopen: false
4+
categories:
5+
- docs
6+
- develop
7+
- stack
8+
- oss
9+
- kubernetes
10+
- clients
11+
linkTitle: RQE performance
12+
weight: 1
13+
---
14+
15+
{{< note >}}
16+
If you're using Redis Software or Redis Cloud, see the [best practices for scalable Redis Query Engine]({{< relref "/operate/oss_and_stack/stack-with-enterprise/search/scalable-query-best-practices" >}}) page.
17+
{{< /note >}}
18+
19+
## Checklist
20+
Below are some basic steps to ensure good performance of the Redis Query Engine (RQE).
21+
22+
* Create a Redis data model with your query patterns in mind.
23+
* Ensure the Redis architecture has been sized for the expected load using the [sizing calculator](https://redis.io/redisearch-sizing-calculator/).
24+
* Provision Redis nodes with sufficient resources (RAM, CPU, network) to support the expected maximum load.
25+
* Review [`FT.INFO`]({{< baseurl >}}/commands/ft.info) and [`FT.PROFILE`]({{< baseurl >}}/commands/ft.profile) outputs for anomalies and/or errors.
26+
* Conduct load testing in a test environment with real-world queries and a load generated by either [memtier_benchmark](https://github.com/redislabs/memtier_benchmark) or a custom load application.
27+
28+
## Indexing considerations
29+
30+
### General
31+
- Favor [`TAG`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#tag-fields" >}}) over [`NUMERIC`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#numeric-fields" >}}) for use cases that only require matching.
32+
- Favor [`TAG`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#tag-fields" >}}) over [`TEXT`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#text-fields" >}}) for use cases that don’t require full-text capabilities (pure match).
33+
34+
### Non-threaded search
35+
- Put only those fields used in your queries in the index.
36+
- Only make fields [`SORTABLE`]({{< relref "/develop/interact/search-and-query/advanced-concepts/sorting" >}}) if they are used in [`SORTBY`]({{< relref "/develop/interact/search-and-query/advanced-concepts/sorting#specifying-sortby" >}})
37+
queries.
38+
- Use [`DIALECT 4`]({{< relref "/develop/interact/search-and-query/advanced-concepts/dialects#dialect-4" >}}).
39+
40+
### Threaded (query performance factor or QPF) search
41+
- Put both query fields and any projected fields (`RETURN` or `LOAD`) in the index.
42+
- Set all fields to `SORTABLE`.
43+
- Set TAG fields to [UNF]({{< relref "/develop/interact/search-and-query/advanced-concepts/sorting#normalization-unf-option" >}}).
44+
- Optional: Set `TEXT` fields to `NOSTEM` if the use case will support it.
45+
- Use [`DIALECT 4`]({{< relref "/develop/interact/search-and-query/advanced-concepts/dialects#dialect-4" >}}).
46+
47+
## Query optimization
48+
49+
- Avoid returning large result sets. Use `CURSOR` or `LIMIT`.
50+
- Avoid wildcard searches.
51+
- Avoid projecting all fields (e.g., `LOAD *`). Project only those fields that are part of the index schema.
52+
- If queries are long-running, enable threading (query performance factor) to reduce contention for the main Redis thread.
53+
54+
## Validate performance (`FT.PROFILE`)
55+
56+
You can analyze [`FT.PROFILE`]({{< baseurl >}}/commands/ft.profile) output to gain insights about query execution.
57+
The following informational items are available for analysis:
58+
59+
- Total execution time
60+
- Execution time per shard
61+
- Coordination time (for multi-sharded environments)
62+
- Breakdown of the query into fundamental components, such as `UNION` and `INTERSECT`
63+
- Warnings, such as `TIMEOUT`
64+
65+
## Anti-patterns
66+
67+
When designing and querying indexes in RQE, certain practices can hinder performance, scalability, and maintainability. Below are some common anti-patterns to avoid:
68+
69+
- **Large documents**: storing excessively large documents in Redis makes data retrieval slower and increases memory usage. Break data into smaller, focused records whenever possible.
70+
- **Deeply-nested fields**: retrieving or indexing deeply-nested JSON fields is computationally expensive. Use a flatter schema for better performance.
71+
- **Large result sets**: fetching unnecessarily large result sets puts a strain on memory and network resources. Limit results to only what is needed.
72+
- **Wildcarding**: using wildcard patterns indiscriminately in queries can lead to large and inefficient scans, especially if the index size is significant.
73+
- **Large projections**: including excessive fields in query results increases memory overhead and slows down query execution. Limit projections to essential fields.
74+
75+
The following examples depict an anti-pattern index schema and query, followed by corrected versions designed for scalability with RQE.
76+
77+
### Anti-pattern index schema
78+
79+
The following schema introduces challenges for scalability and performance:
80+
81+
```sh
82+
FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles:
83+
SCHEMA $.tags.* as t NUMERIC SORTABLE
84+
$.firstName as name TEXT
85+
$.location as loc GEO
86+
```
87+
88+
Issues:
89+
90+
- Minimal schema definition: the schema is sparse and lacks fields like `lastName`, `id`, and `version` that might be frequently queried. This results in additional operations to fetch these fields separately, reducing efficiency.
91+
- Missing `SORTABLE` flag for text fields: sorting operations on unsortable fields require full-text processing, which is slow.
92+
- Wildcard indexing: `$.tags.*` creates a broad index that can lead to excessive memory usage and reduced query performance.
93+
94+
### Anti-pattern query
95+
96+
The following query is inefficient and not optimized for vertical scaling:
97+
98+
```sh
99+
FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]' LOAD * LIMIT 0 10
100+
```
101+
Issues:
102+
103+
- Wildcard projection (`LOAD *`): retrieving all fields in the result set is inefficient and increases memory usage, especially if the documents are large.
104+
- Unnecessary fields: fields that aren't required for the current operation are still fetched, slowing down execution.
105+
- Lack of advanced query syntax: without specifying a query dialect or leveraging features like tagging, the query may perform unnecessary computations.
106+
107+
### Improved index schema
108+
109+
Here’s an optimized schema that adheres to best practices for vertical scaling:
110+
111+
```sh
112+
FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles:
113+
SCHEMA $.tags.* as t NUMERIC SORTABLE
114+
$.firstName as name TEXT NOSTEM SORTABLE
115+
$.lastName as lastname TEXT NOSTEM SORTABLE
116+
$.location as loc GEO SORTABLE
117+
$.id as id TAG SORTABLE UNF
118+
$.ver as ver TAG SORTABLE UNF
119+
```
120+
121+
Improvements:
122+
123+
- `NOSTEM` for text fields: prevents stemming on fields like `firstName` and `lastName` to allow for exact matches (e.g., "Smith" stays "Smith").
124+
- Expanded schema: adds commonly queried fields like `lastName`, `id`, and `version`, making queries more efficient by reducing the need for post-query data retrieval.
125+
- `TAG` fields: `id` and `ver` are defined as `TAG` fields to support fast filtering with exact matches.
126+
- `SORTABLE` for all relevant fields: ensures that sorting operations are efficient without requiring full-text scanning.
127+
128+
You might be wondering why `$.tags.* as t NUMERIC SORTABLE` is acceptable in the improved schema and it wasn't previously.
129+
The inclusion of `$.tags.*` is acceptable when:
130+
131+
- It has a clear purpose: it is actively used in queries, such as filtering on numeric ranges or matching specific values.
132+
- Other fields in the schema complement it: these fields reduce over-reliance on `$.tags.*` for all query operations, distributing the load more evenly.
133+
- Projections and limits are managed carefully: queries that use `$.tags.*` should avoid loading unnecessary fields or returning excessively large result sets.
134+
135+
### Improved query
136+
137+
The following query is better suited for vertical scaling:
138+
139+
```sh
140+
FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]'
141+
LOAD 6 id t name lastname loc ver
142+
LIMIT 0 10
143+
DIALECT 3
144+
```
145+
146+
Improvements:
147+
148+
- Targeted projection: the `LOAD` clause specifies only essential fields (`id, t, name, lastname, loc, ver`), reducing memory and network overhead.
149+
- Limited results: the `LIMIT` clause ensures the query retrieves only the first 10 results, avoiding large result sets.
150+
- [`DIALECT 3`]({{< relref "/develop/interact/search-and-query/advanced-concepts/dialects#dialect-3" >}}): enables the latest RQE syntax and features, ensuring compatibility with modern capabilities.

0 commit comments

Comments
 (0)