Are the results returned by hybrid search guaranteed to be in order from most to least relevant? #41364
-
I have been making the assumption that the results of hybrid search will be returned in order from most to least similar to the query, because this would seem to be the most logical way to order the data. However, as far as I am aware, this is not documented anywhere. Furthermore, can I ask what the meaning of the "distance" field in the returned data is? Is this truly a distance, where a larger value would imply a worse match to the query, or is it a similarity score where a larger value would imply a closer match to the query? I am using the reciprocal rank fusion ranker with hybrid search, so I was expecting that the value which is returned with the data would be the output of this function. In which case it would be a score, where larger values mean a better match, but then naming the metric "distance" in the output is quite confusing Thank you in advance for your help 👍 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Search results are always ordered from most to least relevant, both for search() and hybrid_search().
For hybrid_search(), each hybrid search request is combined by several sub-requests, each sub-request is executed separately. So, the first step of hybrid_search() is get N search result sets from N sub-requests. The second step is to merge the N search result sets into one final result set. Now all the distance/score values are in the same range [0, 1] , 1 is the most relevant, 0 is the least relevant. The next step is: merge the N result sets into one final result set. After rerank, the final result set is ordered in descending order. |
Beta Was this translation helpful? Give feedback.
Search results are always ordered from most to least relevant, both for search() and hybrid_search().
In a search result set, you will see an id-distance or id-score pair for each item. No matter whether we call it "distance" or "score", it is a number output from a mathematical algorithm. The mathematical algorithm calculates the similarity/distance of two embeddings.
All the mathematical algorithms are listed here: https://milvus.io/docs/metric.md
L2 - Euclidean distance
The distance range is [0, ∞), 0 is the most relevant, ∞ is the least relevant. So, if you choose L2 metric type, the search result is ordered by distance in ascending order.
IP - Inner product
The similarity range i…