Questions for Query Deduplication

Hello, nice work on this DeepResearch project.

In the [blog](https://jina.ai/news/a-practical-guide-to-implementing-deepsearch-deepresearch/), it says that -
```
For query deduplication, we initially used an LLM-based solution, but found it difficult to control the similarity threshold. We eventually switched to [jina-embeddings-v3](https://jina.ai/?sui&model=jina-embeddings-v3), which excels at semantic textual similarity tasks. This enables cross-lingual deduplication without worrying that non-English queries would be filtered. The embedding model ended up being crucial not for memory retrieval as initially expected, but for efficient deduplication.
```
However, in the code, it still seems to use LLM for deduplication. 
https://github.com/jina-ai/node-DeepResearch/blob/main/src/tools/dedup.ts#L44

Were there some trade-offs you observed?

Thank you.

- Youngjoon Jang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions for Query Deduplication #112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions for Query Deduplication #112

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions