-
Notifications
You must be signed in to change notification settings - Fork 95
Max normalization technique #1380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.17
Are you sure you want to change the base?
Max normalization technique #1380
Conversation
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
if (scores[0] > scores[1]) { | ||
return scores[0]; | ||
} | ||
return scores[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do Math.max, but I guess this approach is also fine
|| knnQueryMaxScore == Float.MIN_VALUE) { | ||
multiplier = 1; | ||
} else { | ||
multiplier = matchQueryMaxScore / knnQueryMaxScore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know use case, but here you may loose some precision, make sure the error after doing math with floating point numbers is within acceptable threshold
|
||
float matchQueryMaxScore = maxScoresPerSubquery[0]; | ||
log.info("max score of match query [{}]", matchQueryMaxScore); | ||
float knnQueryMaxScore = maxScoresPerSubquery[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is the requirement for them to send knn query always at the second position, but here we're relying on user's good intentions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for the POC purposes, I have made it a compulsion that KNN query should be a second subquery for this normlization technique. This we can convey to the user for testing purposes to always provide knn query as 2nd subquery. Here the intention is to make this POC ready for running a few benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair trade-off for POC. Since testing is done by us to evaluate the technique.
|
||
float matchQueryMaxScore = maxScoresPerSubquery[0]; | ||
log.info("max score of match query [{}]", matchQueryMaxScore); | ||
float knnQueryMaxScore = maxScoresPerSubquery[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair trade-off for POC. Since testing is done by us to evaluate the technique.
multiplier = matchQueryMaxScore / knnQueryMaxScore; | ||
} | ||
|
||
log.info("Multiplier value is [{}]", multiplier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move all logging to debug since benchmarking would also track the performance (latency).
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
|| knnQueryMaxScore == Float.MIN_VALUE) { | ||
multiplier = 1; | ||
} else { | ||
multiplier = Math.round((matchQueryMaxScore / knnQueryMaxScore) * 1000f) / 1000f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with such approach we'll be getting good results for 3 decimal places, just making sure that's fine with requirements.
Also if you want to be extra cautious you can add check for weird values like Infinity or NaN:
float ratio = matchQueryMaxScore / knnQueryMaxScore;
if (Float.isFinite(ratio)) {
multiplier = roundToThreeDecimals(ratio);
} else {
multiplier = 1.0f; // or another appropriate fallback
}
Signed-off-by: Varun Jain <varunudr@amazon.com>
This PR is a test PR. No need to merge or review.