Skip to content

[Hybrid] Add setting for number of documents stored by HybridCollapsingTopDocsCollector #1381

@ryanbogan

Description

@ryanbogan

In HybridCollapsingTopDocs collector, a priority queue is created for each group and subquery combination: https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/search/collector/HybridCollapsingTopDocsCollector.java#L489

The size of each queue is set to the size parameter provided in the query, which is the same as when using hybrid query without collapse. However, we are storing a lot more documents than in hybrid search without collapse because there are multiple queues instead of a single queue.

We should add a setting that allows users to control the number of documents that are stored in each queue. Lowering this setting would prioritize latency over recall, while raising the setting would do the opposite. This setting would allow users the ability to customize for their individual use case.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions