-
Notifications
You must be signed in to change notification settings - Fork 451
Open
Description
Hello, nice work on this DeepResearch project.
In the blog, it says that -
For query deduplication, we initially used an LLM-based solution, but found it difficult to control the similarity threshold. We eventually switched to [jina-embeddings-v3](https://jina.ai/?sui&model=jina-embeddings-v3), which excels at semantic textual similarity tasks. This enables cross-lingual deduplication without worrying that non-English queries would be filtered. The embedding model ended up being crucial not for memory retrieval as initially expected, but for efficient deduplication.
However, in the code, it still seems to use LLM for deduplication.
https://github.com/jina-ai/node-DeepResearch/blob/main/src/tools/dedup.ts#L44
Were there some trade-offs you observed?
Thank you.
- Youngjoon Jang
Metadata
Metadata
Assignees
Labels
No labels