Skip to content

Execute text search only on a subset of documents to reduce execution time #2593

@lbhm

Description

@lbhm

I am working on a project that uses full-text search as part of a larger query language, and recently, I started to consider migrating from Lucene to Tantivy for the text search component.

Since full-text search is only part of our query language, we often encounter the scenario where users write a generic keyword query (say *a*), but we already know from other (non-text) operators in the query that eventually, only a small subset of documents is relevant.

Therefore, my question is: Can I run queries only on a subset of documents based on their document ID in tantivy?

Based on my initial documentation search, I was wondering if the FilterCollector would be relevant here. However, I could not find much information on what I can pass as a TPredicate to the filter.

As a follow-up question: Would using a FilterCollector in combination with a subsequent TopDocs collector for the documents that are not filtered out have a performance benefit?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions