Skip to content

Questions for Query Deduplication #112

@yjoonjang

Description

@yjoonjang

Hello, nice work on this DeepResearch project.

In the blog, it says that -

For query deduplication, we initially used an LLM-based solution, but found it difficult to control the similarity threshold. We eventually switched to [jina-embeddings-v3](https://jina.ai/?sui&model=jina-embeddings-v3), which excels at semantic textual similarity tasks. This enables cross-lingual deduplication without worrying that non-English queries would be filtered. The embedding model ended up being crucial not for memory retrieval as initially expected, but for efficient deduplication.

However, in the code, it still seems to use LLM for deduplication.
https://github.com/jina-ai/node-DeepResearch/blob/main/src/tools/dedup.ts#L44

Were there some trade-offs you observed?

Thank you.

  • Youngjoon Jang

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions