DEV: add query performance tuning page (#1052)

dwdougherty · andy-stark-redis · web-flow · commit 1278ccc05295 · 2025-01-15T06:41:32.000-08:00
* DEV: add query performance tuning page

* Apply review comments.

Co-authored-by: andy-stark-redis &lt;164213578+andy-stark-redis@users.noreply.github.com&gt;

* Apply review comments.

* Apply more review comments.

---------

Co-authored-by: andy-stark-redis &lt;164213578+andy-stark-redis@users.noreply.github.com&gt;
diff --git a/content/develop/interact/search-and-query/best-practices/_index.md b/content/develop/interact/search-and-query/best-practices/_index.md
@@ -0,0 +1,11 @@
+---
+categories:
+- docs
+- develop
+- stack
+- oss
+description: Redis Query Engine best practices
+linkTitle: Best practices
+title: Best practices
+weight: 8
+---
diff --git a/content/develop/interact/search-and-query/best-practices/scalable-query-best-practices.md b/content/develop/interact/search-and-query/best-practices/scalable-query-best-practices.md
@@ -0,0 +1,150 @@
+---
+Title: Best practices for Redis Query Engine performance
+alwaysopen: false
+categories:
+- docs
+- develop
+- stack
+- oss
+- kubernetes
+- clients
+linkTitle: RQE performance
+weight: 1
+---
+
+{{< note >}}
+If you're using Redis Software or Redis Cloud, see the [best practices for scalable Redis Query Engine]({{< relref "/operate/oss_and_stack/stack-with-enterprise/search/scalable-query-best-practices" >}}) page.
+{{< /note >}}
+
+## Checklist
+Below are some basic steps to ensure good performance of the Redis Query Engine (RQE).
+
+* Create a Redis data model with your query patterns in mind.
+* Ensure the Redis architecture has been sized for the expected load using the [sizing calculator](https://redis.io/redisearch-sizing-calculator/).
+* Provision Redis nodes with sufficient resources (RAM, CPU, network) to support the expected maximum load.
+* Review [`FT.INFO`]({{< baseurl >}}/commands/ft.info) and [`FT.PROFILE`]({{< baseurl >}}/commands/ft.profile) outputs for anomalies and/or errors.
+* Conduct load testing in a test environment with real-world queries and a load generated by either [memtier_benchmark](https://github.com/redislabs/memtier_benchmark) or a custom load application.
+
+## Indexing considerations
+
+### General
+- Favor [`TAG`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#tag-fields" >}}) over [`NUMERIC`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#numeric-fields" >}}) for use cases that only require matching.
+- Favor [`TAG`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#tag-fields" >}}) over [`TEXT`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#text-fields" >}}) for use cases that don’t require full-text capabilities (pure match).
+
+### Non-threaded search
+- Put only those fields used in your queries in the index.
+- Only make fields [`SORTABLE`]({{< relref "/develop/interact/search-and-query/advanced-concepts/sorting" >}}) if they are used in [`SORTBY`]({{< relref "/develop/interact/search-and-query/advanced-concepts/sorting#specifying-sortby" >}})
+queries.
+- Use [`DIALECT 4`]({{< relref "/develop/interact/search-and-query/advanced-concepts/dialects#dialect-4" >}}).
+
+### Threaded (query performance factor or QPF) search
+- Put both query fields and any projected fields (`RETURN` or `LOAD`) in the index.
+- Set all fields to `SORTABLE`.
+- Set TAG fields to [UNF]({{< relref "/develop/interact/search-and-query/advanced-concepts/sorting#normalization-unf-option" >}}).
+- Optional: Set `TEXT` fields to `NOSTEM` if the use case will support it.
+- Use [`DIALECT 4`]({{< relref "/develop/interact/search-and-query/advanced-concepts/dialects#dialect-4" >}}).
+
+## Query optimization
+
+- Avoid returning large result sets.  Use `CURSOR` or `LIMIT`.
+- Avoid wildcard searches.
+- Avoid projecting all fields (e.g., `LOAD *`). Project only those fields that are part of the index schema.
+- If queries are long-running, enable threading (query performance factor) to reduce contention for the main Redis thread.
+
+## Validate performance (`FT.PROFILE`)
+
+You can analyze [`FT.PROFILE`]({{< baseurl >}}/commands/ft.profile) output to gain insights about query execution.
+The following informational items are available for analysis:
+
+- Total execution time
+- Execution time per shard
+- Coordination time (for multi-sharded environments)
+- Breakdown of the query into fundamental components, such as `UNION` and `INTERSECT`
+- Warnings, such as `TIMEOUT`
+
+## Anti-patterns
+
+When designing and querying indexes in RQE, certain practices can hinder performance, scalability, and maintainability. Below are some common anti-patterns to avoid:
+
+- **Large documents**: storing excessively large documents in Redis makes data retrieval slower and increases memory usage. Break data into smaller, focused records whenever possible.
+- **Deeply-nested fields**: retrieving or indexing deeply-nested JSON fields is computationally expensive. Use a flatter schema for better performance.
+- **Large result sets**: fetching unnecessarily large result sets puts a strain on memory and network resources. Limit results to only what is needed.
+- **Wildcarding**: using wildcard patterns indiscriminately in queries can lead to large and inefficient scans, especially if the index size is significant.
+- **Large projections**: including excessive fields in query results increases memory overhead and slows down query execution. Limit projections to essential fields.
+
+The following examples depict an anti-pattern index schema and query, followed by corrected versions designed for scalability with RQE.
+
+### Anti-pattern index schema
+
+The following schema introduces challenges for scalability and performance:
+
+```sh
+FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles: 
+          SCHEMA $.tags.* as t NUMERIC SORTABLE 
+                 $.firstName as name TEXT 
+                 $.location as loc GEO
+```
+
+Issues:
+
+- Minimal schema definition: the schema is sparse and lacks fields like `lastName`, `id`, and `version` that might be frequently queried. This results in additional operations to fetch these fields separately, reducing efficiency.
+- Missing `SORTABLE` flag for text fields: sorting operations on unsortable fields require full-text processing, which is slow.
+- Wildcard indexing: `$.tags.*` creates a broad index that can lead to excessive memory usage and reduced query performance.
+
+### Anti-pattern query
+
+The following query is inefficient and not optimized for vertical scaling:
+
+```sh
+FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]' LOAD * LIMIT 0 10
+```
+Issues:
+
+- Wildcard projection (`LOAD *`): retrieving all fields in the result set is inefficient and increases memory usage, especially if the documents are large.
+- Unnecessary fields: fields that aren't required for the current operation are still fetched, slowing down execution.
+- Lack of advanced query syntax: without specifying a query dialect or leveraging features like tagging, the query may perform unnecessary computations.
+
+### Improved index schema
+
+Here’s an optimized schema that adheres to best practices for vertical scaling:
+
+```sh
+FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles: 
+          SCHEMA $.tags.* as t NUMERIC SORTABLE 
+                 $.firstName as name TEXT NOSTEM SORTABLE 
+                 $.lastName as lastname TEXT NOSTEM SORTABLE 
+                 $.location as loc GEO SORTABLE 
+                 $.id as id TAG SORTABLE UNF 
+                 $.ver as ver TAG SORTABLE UNF
+```
+
+Improvements:
+
+- `NOSTEM` for text fields: prevents stemming on fields like `firstName` and `lastName` to allow for exact matches (e.g., "Smith" stays "Smith").
+- Expanded schema: adds commonly queried fields like `lastName`, `id`, and `version`, making queries more efficient by reducing the need for post-query data retrieval.
+- `TAG` fields: `id` and `ver` are defined as `TAG` fields to support fast filtering with exact matches.
+- `SORTABLE` for all relevant fields: ensures that sorting operations are efficient without requiring full-text scanning.
+
+You might be wondering why `$.tags.* as t NUMERIC SORTABLE` is acceptable in the improved schema and it wasn't previously.
+The inclusion of `$.tags.*` is acceptable when:
+
+- It has a clear purpose: it is actively used in queries, such as filtering on numeric ranges or matching specific values.
+- Other fields in the schema complement it: these fields reduce over-reliance on `$.tags.*` for all query operations, distributing the load more evenly.
+- Projections and limits are managed carefully: queries that use `$.tags.*` should avoid loading unnecessary fields or returning excessively large result sets.
+
+### Improved query
+
+The following query is better suited for vertical scaling:
+
+```sh
+FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]' 
+                LOAD 6 id t name lastname loc ver 
+                LIMIT 0 10
+                DIALECT 3
+```
+
+Improvements:
+
+- Targeted projection: the `LOAD` clause specifies only essential fields (`id, t, name, lastname, loc, ver`), reducing memory and network overhead.
+- Limited results: the `LIMIT` clause ensures the query retrieves only the first 10 results, avoiding large result sets.
+- [`DIALECT 3`]({{< relref "/develop/interact/search-and-query/advanced-concepts/dialects#dialect-3" >}}): enables the latest RQE syntax and features, ensuring compatibility with modern capabilities.