Using Score of embedding retriever #6305

Peveld · 2023-11-14T17:00:15Z

Peveld
Nov 14, 2023

Hi, my idea for a RAG Project was to feed only valid documents into the question to the LLM. So I'm trying to find significant differences in the returned score. I followed the tutorial and have something like:

`from haystack.document_stores.faiss import FAISSDocumentStore
from haystack import Document

faissDocumentStore = FAISSDocumentStore(faiss_index_factory_str="Flat", sql_url="sqlite:////tmp/faiss_document_store.db")

documents = [Document(content="The english channel is 30 kilometers wide."), Document(content="la le li la di da")]
faissDocumentStore.write_documents(documents)

from haystack.nodes import EmbeddingRetriever
retriever = EmbeddingRetriever(
document_store=faissDocumentStore, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
)

faissDocumentStore.update_embeddings(retriever)

myquery = "How wide is the english channel?"
docs = retriever.retrieve(query=myquery, top_k=2)

print(docs)`

However the difference in Score is pretty small 0.5734 vs 0.5290. On my real text base I find nearly same score values for perfectly matching docs and perfectly not matching ones. My fantasy was to provide a general threshold... Do I understand something wrong or is there maybe a better approach?

anakin87 · 2023-11-14T17:52:07Z

anakin87
Nov 14, 2023
Maintainer

Hello, @Peveld!

Two quick ideas:

try setting scale_score to False and see if it helps.
It's an init parameter of the Embedding Retriever.
If you still need to filter Documents using a threshold, you can use this node: Haystack Threshold Node.

1 reply

Peveld Nov 16, 2023
Author

Hello @anakin87 ! Thank you for your answer, experimented around with the scale_score set to false, but cannot find a general threshold what robust divides matching from not matching documents. The values are pretty near to each other. For now I'm working with a prompt to LLM "If you cannot find the answer in related text, answer with I dont't know.". But this way I cannot access knowledge of LLM. My idea was to let the answer be based on my documents, if I find significant ones and be based on LLM knowledge if not.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using Score of embedding retriever #6305

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using Score of embedding retriever #6305

Uh oh!

Peveld Nov 14, 2023

Replies: 1 comment · 1 reply

Uh oh!

anakin87 Nov 14, 2023 Maintainer

Uh oh!

Peveld Nov 16, 2023 Author

Peveld
Nov 14, 2023

Replies: 1 comment 1 reply

anakin87
Nov 14, 2023
Maintainer

Peveld Nov 16, 2023
Author