What is the best approach for DocumentStore if Q&A is just for a single document #5157
Replies: 2 comments
-
My recommendation would be to use an Indexing Pipeline and then to use metadata filtering to get information from only one document at a time. Indexing Pipeline: https://docs.haystack.deepset.ai/docs/pipelines#indexing-pipelines You also don't need to recreate the pipeline every time: that is what causes the memory leak I think. Just create it when you start the Flask server and reuse it. |
Beta Was this translation helpful? Give feedback.
-
Can I customize the pipeline to input raw text? If not, how can I convert a raw text input to Document type with metadata as a randomly generated UUID? The metadata won't be helpful here as the input is just until the answer is found. Once found, I would be deleting the Document from Store. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a use case where I need Q&A on a single document. There will be 1000 documents in the pipe with the same set of questions to extract. I tried the below approach but it leads to memory leak.
What is the best way to use InMemoryStore in order to serve only one document at a time? I still want to use the PDFConverter module and split it by passages but only one document at a time.
Once the answers are extracted for a set of questions from that document, it needs to be erased from Memory and repeat the process for the second document.
At present, I created a for loop to iterate over all documents -> process each document InMemory -> Delete all documents from memory -> process the second document from the loop.
Beta Was this translation helpful? Give feedback.
All reactions