Enabler for proper de-duplication of query results in tiered index with compression #689

meiravgri · 2025-06-03T10:42:45Z

Describe the changes in the pull request

A clear and concise description of what the PR is solving.

Which issues this PR fixes

#...
MOD...

Main objects this PR modified

...
...

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

…ions, allowing inheriting classes to control whether the merge should use sets. use it with withSet =true in svs add test to svs

codecov · 2025-06-03T13:41:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.40%. Comparing base (9e77a09) to head (e3a9c96).
Report is 9 commits behind head on rfsaliev/scalable-vector-search-tiered.

Additional details and impacted files

@@                            Coverage Diff                             @@
##           rfsaliev/scalable-vector-search-tiered     #689      +/-   ##
==========================================================================
- Coverage                                   96.43%   96.40%   -0.04%     
==========================================================================
  Files                                         112      113       +1     
  Lines                                        6795     6837      +42     
==========================================================================
+ Hits                                         6553     6591      +38     
- Misses                                        242      246       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rfsaliev · 2025-06-03T14:31:16Z

tests/unit/test_svs_tiered.cpp

+    // ID 54: closer in SVS, farther in flat — expect to return SVS version
+    GenerateAndAddVector<TEST_DATA_T>(svs_index, dim, ids[0], res_values[0]);
+    GenerateAndAddVector<TEST_DATA_T>(flat_index, dim, ids[0], 4);


As I understand, flat index should be prioritized in this case:

user adds a vector with a value version_1 (moved to backend)

user overrides the vector with a value version_2 (not yet moved to backend but kept in flat)

user expects that version_1 is overridden and forgotten and will never appear in query results

This test simulates discrepancies in scores due to quantization differences, not synchronization issues.

The scenario you described is valid. However, in general, the flat index result can also be selected if it yields a better score for the same label compared to the SVS index—again, assuming updates are properly handled and the score is calculated for the same vector version.

Let's assume the following:

user adds a vector with label 1 which is close to further query

user adds 1000 other vectors

user overrides the vector with label 1 with a value which is farther than all other 1000 vectors

user calls topK query where k=10

user does not expect the vector with label 1 in query results, but if the first version of the vector is kept in backend, user will receive it.

then the implementation is wrong.
The vector should have been marked as deleted in the backend index during the update, and should not be part of the results returned from the backend index.

meiravgri · 2025-06-03T15:38:57Z

src/VecSim/algorithms/svs/svs_tiered.h

+        // To avoid duplicates in the final result, we must use withSet=true because backend vectors
+        // are quantized,
+        // and may produce different scores than the flat index for the same label.
+        return this->template topKQueryImp<true>(queryBlob, k, queryParams);


TODO: set to true only if quantization is enabled.

@rfsaliev Could you please guide me on how to check whether the SVS index performs quantization (e.g., LVQ) in a way that minimizes runtime overhead? Ideally, I’d like to avoid adding dynamic casts or branching logic if possible.

meiravgri · 2025-06-03T15:39:04Z

src/VecSim/algorithms/svs/svs_tiered.h

+        // To avoid duplicates in the final result, we must use withSet=true because backend vectors
+        // are quantized,
+        // and may produce different scores than the flat index for the same label.
+        return this->template rangeQueryImp<true>(queryBlob, radius, queryParams, order);


alonre24 · 2025-06-10T05:31:49Z

replaced with #692 - so it can be merged to main directly

meiravgri added 2 commits June 3, 2025 10:42

topkImp

4c0e7d5

In TieredIndex: move topKQuery and rangeQuery to templated *Imp funct…

e3a9c96

…ions, allowing inheriting classes to control whether the merge should use sets. use it with withSet =true in svs add test to svs

rfsaliev reviewed Jun 3, 2025

View reviewed changes

meiravgri commented Jun 3, 2025

View reviewed changes

alonre24 changed the title ~~Meiravg_topk_withset_svs~~ Proper de-duplication of query results in tiered index with compression Jun 5, 2025

alonre24 changed the title ~~Proper de-duplication of query results in tiered index with compression~~ Enabler for proper de-duplication of query results in tiered index with compression Jun 5, 2025

alonre24 mentioned this pull request Jun 9, 2025

Enabler for proper de-duplication of query results in tiered index with compression [MOD-10062] #692

Merged

2 tasks

alonre24 closed this Jun 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabler for proper de-duplication of query results in tiered index with compression #689

Enabler for proper de-duplication of query results in tiered index with compression #689

Uh oh!

meiravgri commented Jun 3, 2025

Uh oh!

codecov bot commented Jun 3, 2025 •

edited

Loading

Uh oh!

rfsaliev Jun 3, 2025

Uh oh!

meiravgri Jun 3, 2025

Uh oh!

rfsaliev Jun 3, 2025 •

edited

Loading

Uh oh!

meiravgri Jun 3, 2025

Uh oh!

meiravgri Jun 3, 2025

Uh oh!

meiravgri Jun 4, 2025

Uh oh!

meiravgri Jun 3, 2025

Uh oh!

alonre24 commented Jun 10, 2025

Uh oh!

Uh oh!

Enabler for proper de-duplication of query results in tiered index with compression #689

Enabler for proper de-duplication of query results in tiered index with compression #689

Uh oh!

Conversation

meiravgri commented Jun 3, 2025

Uh oh!

codecov bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rfsaliev Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

meiravgri Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

rfsaliev Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meiravgri Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

meiravgri Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

meiravgri Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

meiravgri Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

alonre24 commented Jun 10, 2025

Uh oh!

Uh oh!

codecov bot commented Jun 3, 2025 •

edited

Loading

rfsaliev Jun 3, 2025 •

edited

Loading