Execute text search only on a subset of documents to reduce execution time

I am working on a project that uses full-text search as part of a larger query language, and recently, I started to consider migrating from Lucene to Tantivy for the text search component.

Since full-text search is only part of our query language, we often encounter the scenario where users write a generic keyword query (say `*a*`), but we already know from other (non-text) operators in the query that eventually, only a small subset of documents is relevant.

Therefore, my question is: Can I run queries only on a subset of documents based on their document ID in tantivy?

Based on my initial documentation search, I was wondering if the [FilterCollector](https://docs.rs/tantivy/latest/tantivy/collector/struct.FilterCollector.html) would be relevant here. However, I could not find much information on what I can pass as a `TPredicate` to the filter. 

As a follow-up question: Would using a FilterCollector in combination with a subsequent TopDocs collector for the documents that are not filtered out have a performance benefit?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Execute text search only on a subset of documents to reduce execution time #2593

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Execute text search only on a subset of documents to reduce execution time #2593

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions