[Question]: What are the different reranking modes available while using multiple retrievers #8591
Replies: 4 comments
-
🤖 Hello @amitguptadumka, I'm here to assist you with your query regarding the The
In the context you shared, it is also mentioned that there is a fourth mode called "keyword_embedding". However, this mode is not defined in the The 'reciprocal_rerank' mode in the In terms of the scores, when using the 'reciprocal_rerank' mode, the scores are calculated based on the Reciprocal Rerank Fusion algorithm. This algorithm is used to rerank retrieval results without excessive computation or reliance on external models. The algorithm is demonstrated in this paper. I hope this information helps! If you have any more questions, feel free to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Looks like dosu bot isn't giving the right answers 😓 There are just 2 modes right now:
but you can add on a number of different node postprocessors for re-ranking. Read more here: https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/root.html |
Beta Was this translation helpful? Give feedback.
-
@Disiok Thanks for your reply. |
Beta Was this translation helpful? Give feedback.
-
🤖 The 'reciprocal_rerank' mode in the QueryFusionRetriever function is a method of combining the results from several retrievers into one and re-ranking them. This is done by adding up the reciprocal rank of each node in every list where it's retrieved. The nodes are then reordered from highest score to least. The score is calculated as 1.0 / (rank + k), where 'k' is a parameter used to control the impact of outlier rankings. Lower scores indicate lower ranks in the retrieval lists. The 'simple' mode is not mentioned in the provided context, so I cannot provide details about it. In the context shared, the QueryFusionRetriever is used to fuse a vector index with a BM25 based retriever. This allows capturing both semantic relations and keywords in input queries. The reciprocal rerank algorithm is used to re-sort the nodes without using additional models or excessive computation. The setup also queries four times, once with the original query, and generates three more queries. The LlamaIndex codebase does not provide explicit information about different node postprocessors in the given context. However, in the fusion process, there is a step where the node scores are adjusted based on the fused results. This can be considered a form of postprocessing. Here is the relevant code for the 'reciprocal_rerank' mode: def fuse_results(results_dict, similarity_top_k: int = 2):
"""Fuse results."""
k = 60.0 # `k` is a parameter used to control the impact of outlier rankings.
fused_scores = {}
text_to_node = {}
# compute reciprocal rank scores
for nodes_with_scores in results_dict.values():
for rank, node_with_score in enumerate(
sorted(
nodes_with_scores, key=lambda x: x.score or 0.0, reverse=True
)
):
text = node_with_score.node.get_content()
text_to_node[text] = node_with_score
if text not in fused_scores:
fused_scores[text] = 0.0
fused_scores[text] += 1.0 / (rank + k)
# sort results
reranked_results = dict(
sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
)
# adjust node scores
reranked_nodes: List[NodeWithScore] = []
for text, score in reranked_results.items():
reranked_nodes.append(text_to_node[text])
reranked_nodes[-1].score = score
return reranked_nodes[:similarity_top_k] And here is the code for adjusting node scores (postprocessing): # adjust node scores
reranked_nodes: List[NodeWithScore] = []
for text, score in reranked_results.items():
reranked_nodes.append(text_to_node[text])
reranked_nodes[-1].score = score You can find more details in the following notebooks:
Regarding the 'simple' mode, it refers to a threshold ratio that determines whether to merge child nodes into a parent node. This is controlled by the I hope this helps! If you have any more questions, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Question Validation
Question
I am using QueryFusionRetriever to combine lexical+semantic search. I am using ElasticSearch vector db to store the embeddings.
I see in the QueryFusionRetriever there is a mode parameters, i just found in the documentation around 'reciprocal_rerank'.
what other modes are supported, can we have those with explanations for reference?
Also while using reciprocal rerank the score are very low, how to read these scores? any guide would be helpful.
retriever = QueryFusionRetriever(
[vector_retriever, bm25_retriever],
similarity_top_k=3,
num_queries=1, # set this to 1 to disable query generation
mode="reciprocal_rerank",
use_async=True,
verbose=False,
)
Beta Was this translation helpful? Give feedback.
All reactions