Skip to content

Max normalization technique #1380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: 2.17
Choose a base branch
from

Conversation

vibrantvarun
Copy link
Member

This PR is a test PR. No need to merge or review.

Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
if (scores[0] > scores[1]) {
return scores[0];
}
return scores[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do Math.max, but I guess this approach is also fine

|| knnQueryMaxScore == Float.MIN_VALUE) {
multiplier = 1;
} else {
multiplier = matchQueryMaxScore / knnQueryMaxScore;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know use case, but here you may loose some precision, make sure the error after doing math with floating point numbers is within acceptable threshold


float matchQueryMaxScore = maxScoresPerSubquery[0];
log.info("max score of match query [{}]", matchQueryMaxScore);
float knnQueryMaxScore = maxScoresPerSubquery[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the requirement for them to send knn query always at the second position, but here we're relying on user's good intentions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for the POC purposes, I have made it a compulsion that KNN query should be a second subquery for this normlization technique. This we can convey to the user for testing purposes to always provide knn query as 2nd subquery. Here the intention is to make this POC ready for running a few benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair trade-off for POC. Since testing is done by us to evaluate the technique.


float matchQueryMaxScore = maxScoresPerSubquery[0];
log.info("max score of match query [{}]", matchQueryMaxScore);
float knnQueryMaxScore = maxScoresPerSubquery[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair trade-off for POC. Since testing is done by us to evaluate the technique.

multiplier = matchQueryMaxScore / knnQueryMaxScore;
}

log.info("Multiplier value is [{}]", multiplier);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move all logging to debug since benchmarking would also track the performance (latency).

Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
|| knnQueryMaxScore == Float.MIN_VALUE) {
multiplier = 1;
} else {
multiplier = Math.round((matchQueryMaxScore / knnQueryMaxScore) * 1000f) / 1000f;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with such approach we'll be getting good results for 3 decimal places, just making sure that's fine with requirements.
Also if you want to be extra cautious you can add check for weird values like Infinity or NaN:

float ratio = matchQueryMaxScore / knnQueryMaxScore;
if (Float.isFinite(ratio)) {
    multiplier = roundToThreeDecimals(ratio);
} else {
    multiplier = 1.0f; // or another appropriate fallback
}

Signed-off-by: Varun Jain <varunudr@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants