Skip to content

Commit 5f5d50d

Browse files
0ctopus13primeDooyong Kimkolchfa-awsnatebower
authored
Introduce memory optimized vector search (LuceneOnFaiss) in 3.1. (#10119)
* Introduce memory optimized vector search (LuceneOnFaiss) Signed-off-by: Dooyong Kim <kdooyong@amazon.com> * Remove unrelated file Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Signed-off-by: Nathan Bower <nbower@amazon.com> * Update _vector-search/performance-tuning-search.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Add cross-links Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Dooyong Kim <kdooyong@amazon.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Dooyong Kim <kdooyong@amazon.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
1 parent d91fe05 commit 5f5d50d

File tree

4 files changed

+102
-1
lines changed

4 files changed

+102
-1
lines changed

_field-types/supported-field-types/knn-memory-optimized.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,9 @@ For example, if a `compression_level` of `32x` is passed for a `float32` index o
6363
If you set the `compression_level` parameter, then you cannot specify an `encoder` in the `method` mapping. Compression levels greater than `1x` are only supported for `float` vector types.
6464
{: .note}
6565

66+
Starting with OpenSearch 3.1, enabling `on_disk` mode with a `1x` compression level activates [memory-optimized search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/memory-optimized-search/). In this mode, the engine loads data on demand during search instead of loading all data into memory at once.
67+
{: .important}
68+
6669
The following table lists the default `compression_level` values for the available workload modes.
6770

6871
| Mode | Default compression level |
@@ -924,4 +927,4 @@ The memory required for IVF can be estimated using the following formula, where
924927

925928
- [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/)
926929
- [Disk-based vector search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/disk-based-vector-search/)
927-
- [Vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/)
930+
- [Vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/)
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
layout: default
3+
title: Memory-optimized search
4+
parent: Optimizing vector storage
5+
nav_order: 30
6+
---
7+
8+
# Memory-optimized search
9+
Introduced 3.1
10+
{: .label .label-purple }
11+
12+
Memory-optimized search allows the Faiss engine to run efficiently without loading the entire vector index into off-heap memory. Without this optimization, Faiss typically loads the full index into memory, which can become unsustainable if the index size exceeds available physical memory. With memory-optimized search, the engine memory-maps the index file and relies on the operating system's file cache to serve search requests. This approach avoids unnecessary I/O and allows repeated reads to be served directly from the system cache.
13+
14+
Memory-optimized search affects only search operations. Indexing behavior remains unchanged.
15+
{: .note }
16+
17+
## Limitations
18+
19+
The following limitations apply to memory-optimized search in OpenSearch:
20+
- Supported only for the [Faiss engine]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#faiss-engine) with the [HNSW method]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#hnsw-parameters-1)
21+
- Does not support [IVF]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#ivf-parameters) or [product quantization (PQ)]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/faiss-product-quantization)
22+
- Requires an index restart to enable or disable
23+
24+
If you use IVF or PQ, the engine loads data into memory regardless of whether memory-optimized mode is enabled.
25+
{: .important }
26+
27+
## Configuration
28+
29+
To enable memory-optimized search, set `index.knn.memory_optimized_search` to `true` when creating an index:
30+
31+
```json
32+
PUT /test_index
33+
{
34+
"settings": {
35+
"index.knn": true,
36+
"index.knn.memory_optimized_search": true
37+
},
38+
"mappings": {
39+
"properties": {
40+
"vector_field": {
41+
"type": "knn_vector",
42+
"dimension": 128,
43+
"method": {
44+
"name": "hnsw",
45+
"engine": "faiss"
46+
}
47+
}
48+
}
49+
}
50+
}
51+
```
52+
{% include copy-curl.html %}
53+
54+
To enable memory-optimized search on an existing index, you must close the index, update the setting, and then reopen the index:
55+
56+
```json
57+
POST /test_index/_close
58+
```
59+
{% include copy-curl.html %}
60+
61+
```json
62+
PUT /test_index/_settings
63+
{
64+
"index.knn.memory_optimized_search": true
65+
}
66+
```
67+
{% include copy-curl.html %}
68+
69+
```json
70+
POST /test_index/_open
71+
```
72+
{% include copy-curl.html %}
73+
74+
## Integration with disk-based search
75+
76+
When you configure a field with `on_disk` mode and `1x` compression, memory-optimized search is automatically enabled for that field, even if memory optimization isn't enabled at the index level. For more information, see [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/).
77+
78+
79+
Memory-optimized search differs from [disk-based search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/disk-based-vector-search/) because it doesn't use compression or quantization. It only changes how vector data is loaded and accessed during search.
80+
{: .note }
81+
82+
## Performance optimization
83+
84+
When memory-optimized search is enabled, the [warm-up API]({{site.url}}{{site.baseurl}}/vector-search/performance-tuning-search/#warm-up-the-index) loads only the essential information needed for search operations, such as opening streams to the underlying Faiss index file. This minimal warm-up results in:
85+
- Faster initial searches.
86+
- Reduced memory overhead.
87+
- More efficient resource utilization.
88+
89+
For fields where memory-optimized search is disabled, the warm-up process loads vectors into off-heap memory.
90+
91+
## Next steps
92+
93+
- [Disk-based vector search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/disk-based-vector-search/)
94+
- [Vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/)
95+
- [Performance tuning]({{site.url}}{{site.baseurl}}/vector-search/performance-tuning/)

_vector-search/performance-tuning-search.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ Native library indexes are constructed during indexing, but they're loaded into
2323

2424
Once a native library index is loaded (native library indexes are loaded outside of the OpenSearch JVM), OpenSearch caches them in memory. Initial queries are expensive and complete in a few seconds, while subsequent queries are faster and complete in milliseconds (assuming that the k-NN circuit breaker isn't triggered).
2525

26+
Starting with version 3.1, you can use the [memory-optimized search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/memory-optimized-search/) mode, which enables the engine to load only the necessary bytes during search instead of loading the entire index outside the JVM. When this mode is enabled, the warm-up API loads the minimal required information into memory, including opening read streams to the underlying indexes. Thus, the warm-up API helps ensure that searches after warm-up run faster, even with memory-optimized search enabled.
27+
2628
To avoid this latency penalty during your first queries, you can use the warmup API operation on the indexes you want to search:
2729

2830
```json

_vector-search/settings.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Setting | Static/Dynamic | Default | Description
4242
`index.knn.advanced.approximate_threshold` | Dynamic | `0` | The number of vectors that a segment must have before creating specialized data structures for ANN search. Set to `-1` to disable building vector data structures and to `0` to always build them.
4343
`index.knn.advanced.filtered_exact_search_threshold`| Dynamic | None | The filtered ID threshold value used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is lower than this setting's value, then exact search will be performed on the filtered IDs.
4444
`index.knn.derived_source.enabled` | Static | `true` | Prevents vectors from being stored in `_source`, reducing disk usage for vector indexes.
45+
| `index.knn.memory_optimized_search` | Dynamic | `false` | Enables memory-optimized search on an index. |
4546

4647
An index created in OpenSearch version 2.11 or earlier will still use the previous `ef_construction` and `ef_search` values (`512`).
4748
{: .note}

0 commit comments

Comments
 (0)