[Question]: What are the different reranking modes available while using multiple retrievers #8591

amitguptadumka · 2023-10-26T22:17:54Z

amitguptadumka
Oct 26, 2023

Question Validation

I have searched both the documentation and discord for an answer.

Question

I am using QueryFusionRetriever to combine lexical+semantic search. I am using ElasticSearch vector db to store the embeddings.
I see in the QueryFusionRetriever there is a mode parameters, i just found in the documentation around 'reciprocal_rerank'.
what other modes are supported, can we have those with explanations for reference?
Also while using reciprocal rerank the score are very low, how to read these scores? any guide would be helpful.

retriever = QueryFusionRetriever(
[vector_retriever, bm25_retriever],
similarity_top_k=3,
num_queries=1, # set this to 1 to disable query generation
mode="reciprocal_rerank",
use_async=True,
verbose=False,
)

@amitguptadumka · 2023-10-26T22:32:11Z

dosubot[bot]
bot Oct 26, 2023

🤖

Hello @amitguptadumka,

I'm here to assist you with your query regarding the mode parameter in the QueryFusionRetriever function of the LlamaIndex codebase.

The mode parameter in the QueryFusionRetriever function is used to determine the format of the response synthesized by the response synthesizer. It can take three different modes as defined in the KGRetrieverMode class:

KEYWORD: This is the default query mode, which uses keywords to find triplets.
EMBEDDING: This mode uses embeddings to find similar triplets.
HYBRID: This mode combines both keywords and embeddings to find relevant triplets.

In the context you shared, it is also mentioned that there is a fourth mode called "keyword_embedding". However, this mode is not defined in the KGRetrieverMode class and might be specific to the KnowledgeGraphRAGRetriever class. It is recommended to check the specific implementation details of the KnowledgeGraphRAGRetriever class for more information about this mode.

The 'reciprocal_rerank' mode in the QueryFusionRetriever function is used to combine retrieval results from multiple queries and multiple indexes. The retrieved nodes are reranked according to the Reciprocal Rerank Fusion algorithm. This algorithm provides an efficient method for reranking retrieval results without excessive computation or reliance on external models.

In terms of the scores, when using the 'reciprocal_rerank' mode, the scores are calculated based on the Reciprocal Rerank Fusion algorithm. This algorithm is used to rerank retrieval results without excessive computation or reliance on external models. The algorithm is demonstrated in this paper.

I hope this information helps! If you have any more questions, feel free to ask.

Best,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Disiok · 2023-10-29T23:53:50Z

Disiok
Oct 29, 2023
Maintainer

Looks like dosu bot isn't giving the right answers 😓

There are just 2 modes right now:

class FUSION_MODES(str, Enum):
    """Enum for different fusion modes."""

    RECIPROCAL_RANK = "reciprocal_rerank"  # apply reciprocal rank fusion
    SIMPLE = "simple"  # simple re-ordering of results based on original scores

but you can add on a number of different node postprocessors for re-ranking. Read more here: https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/root.html

0 replies

amitguptadumka · 2023-10-30T09:55:09Z

amitguptadumka
Oct 30, 2023
Author

Looks like dosu bot isn't giving the right answers 😓

There are just 2 modes right now:
class FUSION_MODES(str, Enum):
    """Enum for different fusion modes."""

    RECIPROCAL_RANK = "reciprocal_rerank"  # apply reciprocal rank fusion
    SIMPLE = "simple"  # simple re-ordering of results based on original scores
but you can add on a number of different node postprocessors for re-ranking. Read more here: https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/root.html

@Disiok Thanks for your reply.

0 replies

@dosu-bot · 2023-10-30T10:10:05Z

dosubot[bot]
bot Oct 30, 2023

🤖

The 'reciprocal_rerank' mode in the QueryFusionRetriever function is a method of combining the results from several retrievers into one and re-ranking them. This is done by adding up the reciprocal rank of each node in every list where it's retrieved. The nodes are then reordered from highest score to least. The score is calculated as 1.0 / (rank + k), where 'k' is a parameter used to control the impact of outlier rankings. Lower scores indicate lower ranks in the retrieval lists.

The 'simple' mode is not mentioned in the provided context, so I cannot provide details about it.

In the context shared, the QueryFusionRetriever is used to fuse a vector index with a BM25 based retriever. This allows capturing both semantic relations and keywords in input queries. The reciprocal rerank algorithm is used to re-sort the nodes without using additional models or excessive computation. The setup also queries four times, once with the original query, and generates three more queries.

The LlamaIndex codebase does not provide explicit information about different node postprocessors in the given context. However, in the fusion process, there is a step where the node scores are adjusted based on the fused results. This can be considered a form of postprocessing.

Here is the relevant code for the 'reciprocal_rerank' mode:

def fuse_results(results_dict, similarity_top_k: int = 2):
    """Fuse results."""
    k = 60.0  # `k` is a parameter used to control the impact of outlier rankings.
    fused_scores = {}
    text_to_node = {}

    # compute reciprocal rank scores
    for nodes_with_scores in results_dict.values():
        for rank, node_with_score in enumerate(
            sorted(
                nodes_with_scores, key=lambda x: x.score or 0.0, reverse=True
            )
        ):
            text = node_with_score.node.get_content()
            text_to_node[text] = node_with_score
            if text not in fused_scores:
                fused_scores[text] = 0.0
            fused_scores[text] += 1.0 / (rank + k)

    # sort results
    reranked_results = dict(
        sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    )

    # adjust node scores
    reranked_nodes: List[NodeWithScore] = []
    for text, score in reranked_results.items():
        reranked_nodes.append(text_to_node[text])
        reranked_nodes[-1].score = score

    return reranked_nodes[:similarity_top_k]

And here is the code for adjusting node scores (postprocessing):

# adjust node scores
reranked_nodes: List[NodeWithScore] = []
for text, score in reranked_results.items():
    reranked_nodes.append(text_to_node[text])
    reranked_nodes[-1].score = score

You can find more details in the following notebooks:

Regarding the 'simple' mode, it refers to a threshold ratio that determines whether to merge child nodes into a parent node. This is controlled by the simple_ratio_thresh parameter in the AutoMergingRetriever class. If the ratio of the current number of child nodes to the total number of child nodes is greater than simple_ratio_thresh (default value is 0.5), the function merges the child nodes into the parent node. This is done by deleting the child nodes and adding the parent node with an average score of the child nodes. You can find more details in the llama_index/retrievers/auto_merging_retriever.py file.

I hope this helps! If you have any more questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: What are the different reranking modes available while using multiple retrievers #8591

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Question]: What are the different reranking modes available while using multiple retrievers #8591

Uh oh!

amitguptadumka Oct 26, 2023

Question Validation

Question

Replies: 4 comments

Uh oh!

dosubot[bot] bot Oct 26, 2023

Sources

Uh oh!

Disiok Oct 29, 2023 Maintainer

Uh oh!

amitguptadumka Oct 30, 2023 Author

Uh oh!

dosubot[bot] bot Oct 30, 2023

Sources

amitguptadumka
Oct 26, 2023

dosubot[bot]
bot Oct 26, 2023

Disiok
Oct 29, 2023
Maintainer

amitguptadumka
Oct 30, 2023
Author

dosubot[bot]
bot Oct 30, 2023