-
Notifications
You must be signed in to change notification settings - Fork 20
[MOD-8244] fix TieredFactory::EstimateInitialSize #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ssing, leading to a miss computation of the components memory size estimation)
…rocessorsContainerMemory)
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #567 +/- ##
==========================================
- Coverage 96.97% 96.94% -0.04%
==========================================
Files 100 100
Lines 5287 5297 +10
==========================================
+ Hits 5127 5135 +8
- Misses 160 162 +2 ☔ View full report in Codecov by Sentry. |
@@ -129,7 +129,7 @@ TYPED_TEST(HNSWTieredIndexTest, testSizeEstimation) { | |||
|
|||
HNSWParams hnsw_params = {.type = TypeParam::get_index_type(), | |||
.dim = dim, | |||
.metric = VecSimMetric_L2, | |||
.metric = VecSimMetric_Cosine, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does changing the metric in this test enough to cover all the PR changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It covers the second fix.
I'll add a test that validates the attributes of the tiered index indexes to validate the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it was not a problem in the actual indexes, only in the estimation function
I don't have a way to test it.
I already added a test for the components, and it won't harm to have one so I'm keeping it
checkes tierred index indexes attributes and componenets
tests/unit/test_hnsw_tiered.cpp
Outdated
// CosinePreprocessor<TEST_DATA_T> cosine_pp = | ||
// dynamic_cast<MultiPreprocessorsContainer<TEST_DATA_T, 1> | ||
// *>(bf_preprocessors)->preprocessors[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
* initailize all bf params in tiered EstimateInitialSize (metric was missing, leading to a miss computation of the components memory size estimation) * add is_normalized to EstimateComponentsMemory (passed to EstimatePreprocessorsContainerMemory) * addHNSWTieredIndexTest:: testIndexesAttributes checkes tierred index indexes attributes and componenets * remove comment (cherry picked from commit a586edf)
Successfully created backport PR for |
[MOD-8244] fix TieredFactory::EstimateInitialSize (#567) * initailize all bf params in tiered EstimateInitialSize (metric was missing, leading to a miss computation of the components memory size estimation) * add is_normalized to EstimateComponentsMemory (passed to EstimatePreprocessorsContainerMemory) * addHNSWTieredIndexTest:: testIndexesAttributes checkes tierred index indexes attributes and componenets * remove comment (cherry picked from commit a586edf) Co-authored-by: meiravgri <109056284+meiravgri@users.noreply.github.com>
This PR addresses two bugs affecting the initial size estimation of the tiered index.
1. Missing Initialization of
BFParams
members inTieredFactory::EstimateInitialSize
Before this fix, the
TieredHNSWFactory::EstimateInitialSize
was responsible for initializing specific members of theBFParams
struct as follows:However, all other members of
BFParams,
including metric, were left uninitialized, defaulting to zero. As a result, the metric defaulted toVecSimMetric_L2
the default value forVecSimMetric.
This caused the memory size of the brute-force index components to be calculated as if it were a
VecSimMetric_L2
index. Consequently, if the actual metric of the index wasVecSimMetric_Cosine,
the size of the cosine preprocessor was omitted from the estimation.After properly initializing all
BFParams
members, the size estimation for the frontend index of a tiered index using a cosine metric now includes the memory of the cosine preprocessor.As expected, this change caused the tiered index size estimation test for the cosine index to fail, as seen in the first commit’s checks.
2. Accounting for Cosine Preprocessor Size in
TieredHNSWFactory::EstimateInitialSize
For an HNSW index within a tiered index using the
VecSimMetric_Cosine
metric, the cosine preprocessor is typically absent, as normalization is expected to be performed by the frontend index. This behavior was not accounted for in the initial size estimation logic.We now pass a
bool is_normalized
argument to the size estimation functions of the HNSW and brute-force indices. This argument ensures that the logic for size estimation mirrors the logic used during index construction.The
is_normalized
argument is propagated to theEstimateComponentsMemory
method, which adds the memory size of the cosine preprocessor only if:VecSimMetric_Cosine,
andis_normalized == false