Skip to content

MostSimilarDocuments Pipeline embeddings do not match DocumentSearchPipline embeddings with same text #3298

Discussion options

You must be logged in to vote

Hi @mwade-noetic One explanation for the difference that you see in the generated document embeddings is that a document's meta data and in particular its title (name) is taken into account when generating an embedding (and therefore also in your MostSimilarDocumentsPipeline):

passages = [[d.meta["name"] if d.meta and "name" in d.meta else "", d.content] for d in docs] # type: ignore

This step is not used for your query text though when you run the following in your DocumentSearchPipeline:
doc_search_pipeline.run(query=mpnet_result[0][0].content, debug=True)

Even if there is no document name, wha…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
1 reply
@julian-risch
Comment options

Answer selected by julian-risch
Comment options

You must be logged in to vote
2 replies
@mwade-noetic
Comment options

@mayankjobanputra
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants