Retrieval latency from Pinecone: Retrieval vs SelfQueryRetrieval #12282
GiacomoFonseca
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm testing the average time needed by Langchain to retrieve the relevant chunks from a pinecone index. I have chunked, embedded and loaded around 300 pdf files. Each chunk has a few metadata fields like the name of the document, the number, the year and so on.
For the basic retrieval case:
the reply is quite fast and I always measure t < 0.5 seconds to get the chunks back.
When using the SelfQueryRetrieval instead (using text-davinci-003 as llm here):
and asking questions that trigger one or more filters from the metadata, I always get t > 3 seconds!
Do you find it's normal that the time is almost an order of magnitude higher for the self-retriever? Is it because the latter makes an actual OpenAI call to use the llm, and that takes more time?
Imagine you want to build a chatbot based on those documents and want to be able to filter the documents from the user query and answer questions only based on those (expecting it to be faster that way, not slower...) what would you suggest to speed up the process? Use other indexing methods maybe? Llama_index, Elasticsearch...?
Beta Was this translation helpful? Give feedback.
All reactions