Best way to build search engine with haystack #2739
-
I am building a search engine over documents containing german and english text but our priority is the gemran text , so I would like to know what's the best choice of a Retriever would be to use in that case ,also As now I am using BM25 with the german gelectra-large-germanquad model as reader but think since my case is search only I don't need the reader part anymore. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
If you would like to use a dense retrieval model, I would recommend Haystack's EmbeddingRetriever class together with a sentencetransformers model trained on the MSMARCO dataset. There is a list of such models here: https://www.sbert.net/docs/pretrained-models/msmarco-v3.html You can also find them on the HuggingFace model hub, for example, you can load sentence-transformers/msmarco-distilbert-dot-v5 from there. Regarding smaller models that can also work well on CPUs, I would recommend to first try out sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 It's truly multilingual, includes German and is relatively small. Then you could also try the very small models sentence-transformers/all-MiniLM-L6-v2 and sentence-transformers/msmarco-MiniLM-L6-cos-v5 and then another truly multilingual model: sentence-transformers/distiluse-base-multilingual-cased-v1 🤞 |
Beta Was this translation helpful? Give feedback.
If you would like to use a dense retrieval model, I would recommend Haystack's EmbeddingRetriever class together with a sentencetransformers model trained on the MSMARCO dataset. There is a list of such models here: https://www.sbert.net/docs/pretrained-models/msmarco-v3.html You can also find them on the HuggingFace model hub, for example, you can load sentence-transformers/msmarco-distilbert-dot-v5 from there.
Regarding smaller models that can also work well on CPUs, I would recommend to first try out sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 It's truly multilingual, includes German and is relatively small. Then you could also try the very small models sentence-transfo…