Skip to content

[MOD-8244] fix TieredFactory::EstimateInitialSize #567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 13, 2024

Conversation

meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Dec 12, 2024

This PR addresses two bugs affecting the initial size estimation of the tiered index.

1. Missing Initialization of BFParams members in TieredFactory::EstimateInitialSize

Before this fix, the TieredHNSWFactory::EstimateInitialSize was responsible for initializing specific members of the BFParams struct as follows:

    bf_params_output.type = hnsw_params.type;
    bf_params_output.multi = hnsw_params.multi;

However, all other members of BFParams, including metric, were left uninitialized, defaulting to zero. As a result, the metric defaulted to VecSimMetric_L2 the default value for VecSimMetric.
This caused the memory size of the brute-force index components to be calculated as if it were a VecSimMetric_L2 index. Consequently, if the actual metric of the index was VecSimMetric_Cosine, the size of the cosine preprocessor was omitted from the estimation.

After properly initializing all BFParams members, the size estimation for the frontend index of a tiered index using a cosine metric now includes the memory of the cosine preprocessor.
As expected, this change caused the tiered index size estimation test for the cosine index to fail, as seen in the first commit’s checks.

2. Accounting for Cosine Preprocessor Size in TieredHNSWFactory::EstimateInitialSize

For an HNSW index within a tiered index using the VecSimMetric_Cosine metric, the cosine preprocessor is typically absent, as normalization is expected to be performed by the frontend index. This behavior was not accounted for in the initial size estimation logic.
We now pass a bool is_normalized argument to the size estimation functions of the HNSW and brute-force indices. This argument ensures that the logic for size estimation mirrors the logic used during index construction.

The is_normalized argument is propagated to the EstimateComponentsMemory method, which adds the memory size of the cosine preprocessor only if:

  • The metric is VecSimMetric_Cosine, and
  • is_normalized == false

…ssing, leading to a miss computation of the components memory size estimation)
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.94%. Comparing base (35ddc2e) to head (1964758).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #567      +/-   ##
==========================================
- Coverage   96.97%   96.94%   -0.04%     
==========================================
  Files         100      100              
  Lines        5287     5297      +10     
==========================================
+ Hits         5127     5135       +8     
- Misses        160      162       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@meiravgri meiravgri changed the title initailize all bf params in tiered EstimateInitialSize (metric was missing, leading to a miss computation of the components memory size estimation) [MOD-8244] fix TieredFactory::EstimateInitialSize Dec 12, 2024
alonre24
alonre24 previously approved these changes Dec 12, 2024
@@ -129,7 +129,7 @@ TYPED_TEST(HNSWTieredIndexTest, testSizeEstimation) {

HNSWParams hnsw_params = {.type = TypeParam::get_index_type(),
.dim = dim,
.metric = VecSimMetric_L2,
.metric = VecSimMetric_Cosine,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does changing the metric in this test enough to cover all the PR changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It covers the second fix.
I'll add a test that validates the attributes of the tiered index indexes to validate the first.

Copy link
Collaborator Author

@meiravgri meiravgri Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it was not a problem in the actual indexes, only in the estimation function
I don't have a way to test it.
I already added a test for the components, and it won't harm to have one so I'm keeping it

checkes tierred index indexes attributes and componenets
Comment on lines 162 to 164
// CosinePreprocessor<TEST_DATA_T> cosine_pp =
// dynamic_cast<MultiPreprocessorsContainer<TEST_DATA_T, 1>
// *>(bf_preprocessors)->preprocessors[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@meiravgri meiravgri requested review from alonre24 and removed request for GuyAv46 December 13, 2024 06:07
@meiravgri meiravgri added this pull request to the merge queue Dec 13, 2024
Merged via the queue into main with commit a586edf Dec 13, 2024
16 checks passed
@meiravgri meiravgri deleted the meiravg_fix_tiered_size_estimation branch December 13, 2024 15:45
github-actions bot pushed a commit that referenced this pull request Dec 13, 2024
* initailize all bf params in tiered EstimateInitialSize (metric was missing, leading to a miss computation of the components memory size estimation)

* add is_normalized to  EstimateComponentsMemory (passed to EstimatePreprocessorsContainerMemory)

* addHNSWTieredIndexTest:: testIndexesAttributes

checkes tierred index indexes attributes and componenets

* remove comment

(cherry picked from commit a586edf)
Copy link

Successfully created backport PR for 8.0:

github-merge-queue bot pushed a commit that referenced this pull request Dec 15, 2024
[MOD-8244] fix TieredFactory::EstimateInitialSize (#567)

* initailize all bf params in tiered EstimateInitialSize (metric was missing, leading to a miss computation of the components memory size estimation)

* add is_normalized to  EstimateComponentsMemory (passed to EstimatePreprocessorsContainerMemory)

* addHNSWTieredIndexTest:: testIndexesAttributes

checkes tierred index indexes attributes and componenets

* remove comment

(cherry picked from commit a586edf)

Co-authored-by: meiravgri <109056284+meiravgri@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants