Add Scalable Vector Search (SVS) library integration [MOD-8811] #598

rfsaliev · 2025-02-12T11:37:34Z

This PR introducing new index algorithm based on the Scalable Vector Search library.

Scalable Vector Search (SVS) is a performance library for vector similarity search based on Vamana indexing algorithm.
Thanks to the use of Locally-adaptive Vector Quantization (LVQ) and its highly optimized indexing and search algorithms,
SVS provides vector similarity search:

on billions of high-dimensional vectors,
at high accuracy
and state-of-the-art speed,
while enabling the use of less memory than its alternatives.

This enables application and framework developers using similarity search to unleash its performance on Intel ® Xeon CPUs (2nd generation and newer).

Full information about Scalable Vector Search functionalities and features described in the SVS documentation.

Change details

This PR introduces API changes
This PR introduces serialization changes

API changes

New algorithm kind 'VecSimAlgo_SVS`
New index construction parameters SVSParams
New index query runtime parameters SVSRuntimeParams

Build configuration changes

Added SVS library dependency
New CMake option SVS_REPOSITORY allows to specify Scalable Vector Search repository URL
Added compiler optimization options required to enable platform optimizations implemented in SVS

SVS library integration sources

svs.h - SVSIndex class to implement new algorithm
svs_batch_iterator.h - Batch Iterator implementation for SVSIndex
svs_utils.h - Utility classes and functions for 'SVS' algorithm implementation
svs_extensions.h - Optional LVQ support
svs_factory.h, svs_factory.cpp - Index factory utilities for the 'SVS' algorithm
test_svs.cpp - Unit tests for 'SVS' algorithm implementation

TODO

Add Tiered SVS index algorithm support: [SVS] Add Tiered SVS index implementation rfsaliev/VectorSimilarity#1

CLAassistant · 2025-02-12T11:37:41Z

All committers have signed the CLA.

codecov · 2025-02-13T16:44:37Z

Codecov Report

Attention: Patch coverage is 88.63636% with 55 lines in your changes missing coverage. Please review.

Project coverage is 96.52%. Comparing base (bfa32c9) to head (b811141).
Report is 3 commits behind head on main.

❗ Current head b811141 differs from pull request most recent head 48c5f7f

Please upload reports for the commit 48c5f7f to get more accurate results.

Files with missing lines	Patch %	Lines
src/VecSim/index_factories/svs_factory.cpp	63.91%	35 Missing ⚠️
src/VecSim/algorithms/svs/svs_utils.h	88.33%	7 Missing ⚠️
src/VecSim/algorithms/svs/svs.h	97.28%	6 Missing ⚠️
src/VecSim/algorithms/svs/svs_batch_iterator.h	91.22%	5 Missing ⚠️
src/VecSim/vec_sim.cpp	92.85%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #598      +/-   ##
==========================================
- Coverage   97.23%   96.52%   -0.71%     
==========================================
  Files         106      110       +4     
  Lines        5713     6194     +481     
==========================================
+ Hits         5555     5979     +424     
- Misses        158      215      +57

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

meiravgri

This is very exciting!

For future development, please pay special attention to in-code comments that explain the logic and purpose behind key parts of the code.

Please note that any internal implementation of SVS dependency that uses threads should not be used.
Our library has its own thread pool, which should be used via RediSearch API to ensure proper resource management.
The same applies to logging—the SVS loger should point to the VecSimIndexInterface logging function

I also noticed a warning raised both in our CI and when building locally:

svs_extensions.h:57:17: note: '#pragma message: SVS LVQ is not available'

Is this expected?

Additionally, I observed that the SVS unit tests take significantly longer to run compared to their equivalent HNSW tests. Could you clarify why that might be?

src/VecSim/algorithms/svs/svs.h

src/VecSim/utils/vec_utils.cpp

src/VecSim/vec_sim.cpp

src/VecSim/vec_sim_common.h

src/VecSim/algorithms/svs/svs_utils.h

src/VecSim/algorithms/svs/svs.h

meiravgri · 2025-03-02T10:03:31Z

src/VecSim/algorithms/svs/svs_utils.h

+        auto prefetch_parameters =
+            svs::index::vamana::extensions::estimate_prefetch_parameters(data);
+        auto builder = svs::index::vamana::VamanaBuilder(
+            graph, data, std::move(distance), parameters, threadpool, prefetch_parameters);


so actually the distance calculator component of the index is not going to be used?

SVS has it's own distance calculation component with support of LVQ

so why is it initialized in svs factory?

Sorry, do not understand the question.

i'll open a pr to allow creating an index without a distance calculator component

tests/unit/test_svs.cpp

rfsaliev · 2025-03-05T11:10:30Z

This is very exciting!

Thank you much @meiravgri for the review.

For future development, please pay special attention to in-code comments that explain the logic and purpose behind key parts of the code.

Sure, I will add comments in most questionable parts of code. I see that many questions in the review were related to SVS API limitations and requirements - so I will try to explain the logic behind.

Please note that any internal implementation of SVS dependency that uses threads should not be used. Our library has its own thread pool, which should be used via RediSearch API to ensure proper resource management. The same applies to logging—the SVS loger should point to the VecSimIndexInterface logging function

For the performance purpose, SVS utilizes kind of 'backend' parallelization which is based on internal threadpool. I agree that it would prefer to use RediSearch threadpool, but unfortunately, SVS TP usage do not match to RedisSearch behavior: RediSearch thread pool is seemed like 'asynchronous' - current thread is not blocked by tasks submitted to RediSearch TP. When SVS requires 'synchronous' threadpool - current thread should be blocked util all tasks are finished.
As far as you have much more knowledge of RediSearch thread pooling, I would appreciate if you help me to solve this conflict.
Thank you..

I also noticed a warning raised both in our CI and when building locally:

svs_extensions.h:57:17: note: '#pragma message: SVS LVQ is not available'

Is this expected?

Yes, it is kind of warning that LVQ feature is not enabled for the SVS version you got.

Additionally, I observed that the SVS unit tests take significantly longer to run compared to their equivalent HNSW tests. Could you clarify why that might be?
I did not compare unit tests performance of SVS and HNSW, but I can assume, that the main root cause can be in index initialization time: SVS is optimized for adding to index a batch of vectors when VecSim API is defined to add a vector-by-vector.
The 'tiered SVS index' feature (implemented there: rfsaliev#1) is intended to solve this issue.

alonre24

Nice work!
Added a few comments to continue @meiravgri's review.
Let's make sure that commented code is removed and TODOs are handled before we continue the review. Thanks!

src/VecSim/CMakeLists.txt

src/VecSim/vec_sim_common.h

alonre24 · 2025-03-09T12:52:45Z

src/VecSim/vec_sim.cpp

@@ -48,6 +48,43 @@ static VecSimResolveCode _ResolveParams_EFRuntime(VecSimAlgo index_type, VecSimR
    return VecSimParamResolver_OK;
 }

+static VecSimResolveCode _ResolveParams_WSSearch(VecSimAlgo index_type, VecSimRawParam rparam,


Please add unit tests for errors use cases as well for coverage

Unit tests added to 'test_svs.cpp'

src/VecSim/algorithms/svs/svs.h

alonre24 · 2025-03-09T13:35:33Z

src/VecSim/CMakeLists.txt

+set(SVS_REPOSITORY "https://github.com/intel/ScalableVectorSearch.git" CACHE STRING "SVS repository")
+
+include(FetchContent)


Is there a particular reason to use FetchContent rather than using a submodule?

We just followed existing VecSim approach: here is no .gitmodules but googletest, google_benchmark, pybind11 linked using FetchContent

As discussed, since svs is a substantial submodule that is not only required for testing, we will want to have it as a git submodule to have a more convenient mechanism for tracking and changing the versions.

I've switched SVS to submodule, but got PR builds fail because submodule update is not run in CI builds.

should be fixed now

src/VecSim/algorithms/svs/svs_utils.h

rfsaliev · 2025-03-13T15:35:39Z

@alonre24, @meiravgri, I've made most of requested changes except cases where I need your answers/decisions (see my comments).
Can you please review updated code and my responses to your comments?
Thank you.

meiravgri

Great work, and well done on improving the comments!
I’ve mostly completed the review—batch iterator is still WIP, but you can start addressing the requested modifications.

My main concerns:

Redundant allocation of preprocessors and index calculators.
Duplicate preprocessing happening in both VecSim and SVS.
Thread safety—what ensures it in the current implementation? I didn’t see any locks. Correct me if I’m wrong, but we agreed that this phase would include global locks.

src/VecSim/CMakeLists.txt

src/VecSim/algorithms/svs/svs.h

src/VecSim/algorithms/svs/svs_utils.h

src/VecSim/algorithms/svs/svs.h

src/VecSim/algorithms/svs/svs_batch_iterator.h

src/VecSim/algorithms/svs/svs.h

meiravgri · 2025-03-31T15:14:50Z

src/VecSim/algorithms/svs/svs.h

+        memcpy(processed_blob.get(), original_data, data_size);
+        // Preprocess each vector in place
+        for (size_t i = 0; i < n; i++) {
+            this->preprocessQueryInPlace(static_cast<DataType *>(processed_blob.get()) +


correct. We should have an equivalent preprocessForStorageInPlace
that's on me to implement

src/VecSim/algorithms/svs/svs.h

meiravgri · 2025-03-31T15:28:02Z

src/VecSim/algorithms/svs/svs.h

+    }
+
+    int addVectorsImpl(const void *vectors_data, const labelType *labels, size_t n) {
+        if (n == 0) {


as discussed, assert n == 1 until tiered index is supported

n==1 is the normal case to implement VecSimIndexInterface see int addVector(const void *vector_data, labelType label) override at line 288

yes. i meant this is the only acceptable case for now.

Seems like I have to remove the svs_bulk_vectors_add_delete_test unit test as well.
But revert it back in tiered index.
Agree?

yes. In general, consider separting tiered svs and vanila tests

src/VecSim/algorithms/svs/svs.h

rfsaliev · 2025-04-02T08:23:04Z

tests/unit/test_svs.cpp

+    VecSimIndex_Free(index);
+}
+
+TYPED_TEST(SVSTest, resolve_ws_search_runtime_params) {


@alonre24 , please check if tests here and below meet this request: #598 (comment)

* Add new index algorithm 'SVS' including * New SVSindex class * SVS index factory * SVS-specific index parameters * SVS-specific runtime parameters * Implement SVS unit_test * Add 'SVS' algorithm support to Python bindings Co-authored-by: Dorin-Marian Ionita <dorin-marian.ionita@intel.com> Co-authored-by: Maria Markova <maria.markova@intel.com>

* Resolve formatting issues * Fix test_svs.py * Fix CMake options

* Shared ownership of SVS index implementation is removed from SVS_BatchIterator * More accurate index capacity calculation * Renamed template parameter for SVSGraphBuilder * Cleaned up includes * Refactored SVS index parameters handling * Added bulk-of-vectors preprocessing feature * Cleaned artefact comments

Renames: * VecSimOptionBool -> VecSimOptionMode * VecSimOption_DEFAULT -> VecSimOption_AUTO * Add "AUTO" option support to ResolveParams_UseSearchHistory()

* Defined CMake cache variable SVS_SHARED_LIB (default: OFF) to download and link SVS LVQ shared library

* Renamed internal helper functions SVSIndexVectorSize() -> QuantizedVectorSize() * Empty index capacity is 0 * Remove usage of VecSimQueryParams::batchSize from rangeQuery()

alonre24 · 2025-04-02T14:02:39Z

tests/unit/test_svs.cpp

+// clang-format off
+using SVSDataTypeSet = ::testing::Types<SVSIndexType<VecSimType_FLOAT32, float, VecSimSvsQuant_NONE>
+#if HAVE_SVS_LVQ
+                                       ,SVSIndexType<VecSimType_FLOAT32, float, VecSimSvsQuant_8>


No need to test other quantisation variants?

alonre24

2 issues to deal in next PR

alonre24 · 2025-04-03T17:05:37Z

src/VecSim/index_factories/svs_factory.cpp

+            .dataSize = dataSize,
+            .metric = svsParams.metric,
+            .blockSize = svsParams.blockSize,
+            .multi = false,


We need to fail if we are given multi is true until we support this capability - index creation should return null with proper logging.

Assuming there is no way to request creation of SVS index with multi=true because no such option in the SVSParams - in opposite to HNSD or BruteForce where .multi field exists.
Am I wrong?

You are right.
When will we have support to create SVS index of type multi as well?

alonre24 · 2025-04-03T17:17:03Z

requirements.txt

@@ -1,2 +1,3 @@
 pip>=21.1
 poetry==1.4.2
+mkl-devel>=2025.1.0


Can we add the installation of mkl to the install_script.sh using sudo apt install intel-oneapi-mkl-devel instead of here?
Also if you can add this line to the codeql-analysis.yml after "Install Cmake" step that would be great (not a must)

rfsaliev mentioned this pull request Feb 12, 2025

Add Scalable Vector Search (SVS) library integration support RediSearch/RediSearch#5640

Draft

2 tasks

rfsaliev changed the base branch from main to feature_svs_index_support February 13, 2025 09:59

rfsaliev marked this pull request as ready for review February 13, 2025 10:00

rfsaliev force-pushed the scalable-vector-search branch from b0520e7 to 3193382 Compare February 13, 2025 10:58

alonre24 changed the title ~~Add Scalable Vector Search (SVS) library integration~~ Add Scalable Vector Search (SVS) library integration [MOD-8811] Feb 13, 2025

rfsaliev force-pushed the scalable-vector-search branch from fb170d9 to 7b4483e Compare February 14, 2025 14:06

meiravgri reviewed Mar 3, 2025

View reviewed changes

alonre24 reviewed Mar 9, 2025

View reviewed changes

rfsaliev force-pushed the scalable-vector-search branch 3 times, most recently from 6555a2d to 6d8b83d Compare March 13, 2025 15:19

rfsaliev requested review from alonre24 and meiravgri March 13, 2025 15:31

meiravgri reviewed Mar 23, 2025

View reviewed changes

rfsaliev force-pushed the scalable-vector-search branch 2 times, most recently from fb22f24 to 8e82d4d Compare March 28, 2025 13:30

rfsaliev changed the base branch from feature_svs_index_support to main March 28, 2025 13:30

rfsaliev force-pushed the scalable-vector-search branch 2 times, most recently from 0a19127 to feca261 Compare March 31, 2025 12:57

meiravgri reviewed Apr 1, 2025

View reviewed changes

rfsaliev force-pushed the scalable-vector-search branch from eca8bc4 to 8ed29d2 Compare April 1, 2025 17:00

rfsaliev commented Apr 2, 2025

View reviewed changes

rfsaliev and others added 4 commits April 2, 2025 15:47

Resolve PR build issues

f7e6eb5

* Resolve formatting issues * Fix test_svs.py * Fix CMake options

[SVS] Fix SVSIndex::runGC()

968ce7c

rfsaliev added 16 commits April 2, 2025 15:47

[SVS] Code review changes step 2

ab972cd

[SVS] Code review step 3: add basic heuristic to preferAdHocSearch()

453019b

[SVS] Code review step 4: add tests and fixes for _ResolveParams_XXX

c66cfe0

[SVS] Code review step 5: force single-threaded SVS threadpool

e79f5e9

[SVS] Code review step 6: remove SVSIndex::runGC()

6386697

[SVS] Code review s2e1: Apply suggestions

655591e

[SVS] Code review s2e2: Fix rangeQuery()

639835c

[SVS] Code review s2e3: Cleanup TODOs

3282d8c

[SVS] Code review s2e4: Update unit tests to improve coverage.

bb5c745

[SVS] Code review s2e5: Rename VecSimQuantXXX -> VecSimSvsQuantXXX.

b963318

[SVS] Code review s2e6: SVS submodule rather than FetchContent.

b5b44dc

[SVS] Code review s2e7: Refactor VecSimOptionBool enum.

117978c

Renames: * VecSimOptionBool -> VecSimOptionMode * VecSimOption_DEFAULT -> VecSimOption_AUTO * Add "AUTO" option support to ResolveParams_UseSearchHistory()

[SVS] Fix python bindings build

0e71866

[SVS] Add optional SVS LVQ Shared library usage

17cc4cd

* Defined CMake cache variable SVS_SHARED_LIB (default: OFF) to download and link SVS LVQ shared library

[SVS] Code review s2e7: renames and capacity changes.

a656e3f

* Renamed internal helper functions SVSIndexVectorSize() -> QuantizedVectorSize() * Empty index capacity is 0 * Remove usage of VecSimQueryParams::batchSize from rangeQuery()

[SVS] Fix Cosine unit tests for LVQ case

1599163

rfsaliev force-pushed the scalable-vector-search branch from b811141 to 1599163 Compare April 2, 2025 13:49

alonre24 reviewed Apr 2, 2025

View reviewed changes

[SVS] Enforce sequential thread poll for LVQ storage compressor

7be9920

rfsaliev force-pushed the scalable-vector-search branch from 2f716e4 to 4d888fa Compare April 3, 2025 14:47

Enable SVS LVQ shared library by default

48c5f7f

rfsaliev force-pushed the scalable-vector-search branch 2 times, most recently from 5cd2c63 to 48c5f7f Compare April 3, 2025 15:03

alonre24 merged commit 46bca86 into RedisAI:main Apr 3, 2025
11 of 15 checks passed

alonre24 reviewed Apr 3, 2025

View reviewed changes

meiravgri mentioned this pull request May 9, 2025

remove flags #671

Merged

		set(SVS_REPOSITORY "https://github.com/intel/ScalableVectorSearch.git" CACHE STRING "SVS repository")

		include(FetchContent)

Add Scalable Vector Search (SVS) library integration [MOD-8811] #598

Add Scalable Vector Search (SVS) library integration [MOD-8811] #598

Uh oh!

Conversation

rfsaliev commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change details

Uh oh!

CLAassistant commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

meiravgri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meiravgri Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rfsaliev commented Mar 5, 2025

Uh oh!

alonre24 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfsaliev Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rfsaliev commented Mar 13, 2025

Uh oh!

meiravgri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rfsaliev commented Feb 12, 2025 •

edited

Loading

CLAassistant commented Feb 12, 2025 •

edited

Loading

codecov bot commented Feb 13, 2025 •

edited

Loading

meiravgri Apr 1, 2025 •

edited

Loading

rfsaliev Mar 10, 2025 •

edited

Loading