Introduce memory optimized vector search (LuceneOnFaiss) in 3.1. (#10119)

0ctopus13prime · Dooyong Kim · kolchfa-aws · web-flow · commit 5f5d50dc39b2 · 2025-06-23T18:30:44.000-04:00
* Introduce memory optimized vector search (LuceneOnFaiss)

Signed-off-by: Dooyong Kim &lt;kdooyong@amazon.com&gt;

* Remove unrelated file

Signed-off-by: Fanit Kolchina &lt;kolchfa@amazon.com&gt;

* Apply suggestions from code review

Signed-off-by: Nathan Bower &lt;nbower@amazon.com&gt;

* Update _vector-search/performance-tuning-search.md

Signed-off-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;

* Add cross-links

Signed-off-by: Fanit Kolchina &lt;kolchfa@amazon.com&gt;

---------

Signed-off-by: Dooyong Kim &lt;kdooyong@amazon.com&gt;
Signed-off-by: Fanit Kolchina &lt;kolchfa@amazon.com&gt;
Signed-off-by: Nathan Bower &lt;nbower@amazon.com&gt;
Signed-off-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;
Co-authored-by: Dooyong Kim &lt;kdooyong@amazon.com&gt;
Co-authored-by: Fanit Kolchina &lt;kolchfa@amazon.com&gt;
Co-authored-by: Nathan Bower &lt;nbower@amazon.com&gt;
Co-authored-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;
diff --git a/_field-types/supported-field-types/knn-memory-optimized.md b/_field-types/supported-field-types/knn-memory-optimized.md
@@ -63,6 +63,9 @@ For example, if a `compression_level` of `32x` is passed for a `float32` index o
 If you set the `compression_level` parameter, then you cannot specify an `encoder` in the `method` mapping. Compression levels greater than `1x` are only supported for `float` vector types.
 {: .note}
 
+Starting with OpenSearch 3.1, enabling `on_disk` mode with a `1x` compression level activates [memory-optimized search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/memory-optimized-search/). In this mode, the engine loads data on demand during search instead of loading all data into memory at once.
+{: .important}
+
 The following table lists the default `compression_level` values for the available workload modes.
 
 | Mode | Default compression level    |
@@ -924,4 +927,4 @@ The memory required for IVF can be estimated using the following formula, where
 
 - [k-NN query]({{site.url}}{{site.baseurl}}/query-dsl/specialized/k-nn/)
 - [Disk-based vector search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/disk-based-vector-search/)
-- [Vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/)
+- [Vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/)
diff --git a/_vector-search/optimizing-storage/memory-optimized-search.md b/_vector-search/optimizing-storage/memory-optimized-search.md
@@ -0,0 +1,95 @@
+---
+layout: default
+title: Memory-optimized search
+parent: Optimizing vector storage
+nav_order: 30
+---
+
+# Memory-optimized search
+Introduced 3.1
+{: .label .label-purple }
+
+Memory-optimized search allows the Faiss engine to run efficiently without loading the entire vector index into off-heap memory. Without this optimization, Faiss typically loads the full index into memory, which can become unsustainable if the index size exceeds available physical memory. With memory-optimized search, the engine memory-maps the index file and relies on the operating system's file cache to serve search requests. This approach avoids unnecessary I/O and allows repeated reads to be served directly from the system cache.
+
+Memory-optimized search affects only search operations. Indexing behavior remains unchanged.
+{: .note }
+
+## Limitations
+
+The following limitations apply to memory-optimized search in OpenSearch:
+- Supported only for the [Faiss engine]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#faiss-engine) with the [HNSW method]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#hnsw-parameters-1) 
+- Does not support [IVF]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#ivf-parameters) or [product quantization (PQ)]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/faiss-product-quantization)
+- Requires an index restart to enable or disable
+
+If you use IVF or PQ, the engine loads data into memory regardless of whether memory-optimized mode is enabled.
+{: .important }
+
+## Configuration
+
+To enable memory-optimized search, set `index.knn.memory_optimized_search` to `true` when creating an index:
+
+```json
+PUT /test_index
+{
+  "settings": {
+    "index.knn": true,
+    "index.knn.memory_optimized_search": true
+  },
+  "mappings": {
+    "properties": {
+      "vector_field": {
+        "type": "knn_vector",
+        "dimension": 128,
+        "method": {
+          "name": "hnsw",
+          "engine": "faiss"
+        }
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+To enable memory-optimized search on an existing index, you must close the index, update the setting, and then reopen the index:
+
+```json
+POST /test_index/_close
+```
+{% include copy-curl.html %}
+
+```json
+PUT /test_index/_settings
+{
+  "index.knn.memory_optimized_search": true
+}
+```
+{% include copy-curl.html %}
+
+```json
+POST /test_index/_open
+```
+{% include copy-curl.html %}
+
+## Integration with disk-based search
+
+When you configure a field with `on_disk` mode and `1x` compression, memory-optimized search is automatically enabled for that field, even if memory optimization isn't enabled at the index level. For more information, see [Memory-optimized vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-memory-optimized/).
+
+
+Memory-optimized search differs from [disk-based search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/disk-based-vector-search/) because it doesn't use compression or quantization. It only changes how vector data is loaded and accessed during search.
+{: .note }
+
+## Performance optimization
+
+When memory-optimized search is enabled, the [warm-up API]({{site.url}}{{site.baseurl}}/vector-search/performance-tuning-search/#warm-up-the-index) loads only the essential information needed for search operations, such as opening streams to the underlying Faiss index file. This minimal warm-up results in:
+- Faster initial searches.
+- Reduced memory overhead.
+- More efficient resource utilization.
+
+For fields where memory-optimized search is disabled, the warm-up process loads vectors into off-heap memory.
+
+## Next steps
+
+- [Disk-based vector search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/disk-based-vector-search/)
+- [Vector quantization]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/knn-vector-quantization/)
+- [Performance tuning]({{site.url}}{{site.baseurl}}/vector-search/performance-tuning/)
diff --git a/_vector-search/performance-tuning-search.md b/_vector-search/performance-tuning-search.md
@@ -23,6 +23,8 @@ Native library indexes are constructed during indexing, but they're loaded into
 
 Once a native library index is loaded (native library indexes are loaded outside of the OpenSearch JVM), OpenSearch caches them in memory. Initial queries are expensive and complete in a few seconds, while subsequent queries are faster and complete in milliseconds (assuming that the k-NN circuit breaker isn't triggered).
 
+Starting with version 3.1, you can use the [memory-optimized search]({{site.url}}{{site.baseurl}}/vector-search/optimizing-storage/memory-optimized-search/) mode, which enables the engine to load only the necessary bytes during search instead of loading the entire index outside the JVM. When this mode is enabled, the warm-up API loads the minimal required information into memory, including opening read streams to the underlying indexes. Thus, the warm-up API helps ensure that searches after warm-up run faster, even with memory-optimized search enabled.
+
 To avoid this latency penalty during your first queries, you can use the warmup API operation on the indexes you want to search:
 
 ```json
diff --git a/_vector-search/settings.md b/_vector-search/settings.md
@@ -42,6 +42,7 @@ Setting | Static/Dynamic | Default | Description
 `index.knn.advanced.approximate_threshold` | Dynamic | `0` | The number of vectors that a segment must have before creating specialized data structures for ANN search. Set to `-1` to disable building vector data structures and to `0` to always build them.
 `index.knn.advanced.filtered_exact_search_threshold`| Dynamic | None    | The filtered ID threshold value used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is lower than this setting's value, then exact search will be performed on the filtered IDs.
 `index.knn.derived_source.enabled` | Static | `true` | Prevents vectors from being stored in `_source`, reducing disk usage for vector indexes.
+| `index.knn.memory_optimized_search`    | Dynamic | `false` | Enables memory-optimized search on an index. |
 
 An index created in OpenSearch version 2.11 or earlier will still use the previous `ef_construction` and `ef_search` values (`512`).
 {: .note}