Skip to content

pisa-engine/sprawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Fast and Effective Early Termination for Simple Ranking Functions

Jinrui Gou (NYU), Antonio Mallia (Pinecone), Yifan Liu (UCLA), Minghao Shao (NYU), Torsten Suel (NYU)
SIGIR ’25, Padua, Italy — July 13–18, 2025


📄 Paper Abstract

Web search systems often rely on a fast, simple ranking stage (e.g. BM25, Query Likelihood, learned sparse models like DeepImpact or DocT5Query) to generate candidate documents before applying heavier re-rankers. This paper introduces SPRAWL (Sorted PRefix Access With Lookups), an early-termination technique tailored to such simple ranking methods.

SPRAWL leverages a two-tier inverted index:

  • A prefix index for selected single terms and term-pairs, sorted by impact scores.
  • A fallback full inverted index for precise scoring.

At query time, SPRAWL:

  1. Draws a budgeted number of top-impact postings from the prefix index.
  2. Aggregates partial scores per document.
  3. Performs limited lookups for the top candidates to complete full scores.
  4. Returns the final top‑K results.

This achieves near-exhaustive ranking quality (e.g. nDCG@10 close to baseline on MS MARCO v1/v2) with response times down to sub‑millisecond levels—often an order-of-magnitude faster than safe baselines like MaxScore.


🚀 Key Contributions

  • SPRAWL method: Adapts score‑at‑a‑time ideas for early termination on simple and learned sparse ranking functions.
  • Prefix index with term-pairs: Introducing two-term structures significantly boosts ranking accuracy under tight access budgets.
  • Efficiency without sacrifice: Achieves high nDCG@10, recall@1k, and RBO, while delivering mean response time between ~0.5–2.5 ms on MS MARCO.
  • Generality: Supports BM25, BM25+DocT5Query, DeepImpact, DeeperImpact, and Tilde—covering both traditional and modern learned sparse scenarios.

🧠 Approach Description (SPRAWL)

  1. Indexing Phase

    • Build prefix indexes over high-impact single terms and term-pairs using training query logs.
    • For each selected term or term-pair, store top‑p postings sorted by aggregated impact scores.
  2. Query Phase

    • Use budgets: access budget a_b (postings) + lookup budget l_b (document lookups).
    • Step-by-step:
      1. Select highest-impact postings (up to a_b) across all relevant prefixes.
      2. Aggregate partial document scores.
      3. For the top‑l_b documents with partial scores, lookup missing term scores via the full index.
      4. Return final top‑K ranked results.

This technique simplifies prior methods (e.g., from [39]) by using hash-based accumulation and budgeted lookup, avoiding complex learned scoring.


🧪 Evaluation

  • Datasets: MS MARCO Passage v1 (8.8M passages) & v2 (138M)
  • Ranking functions: BM25, BM25+DocT5Query, DeepImpact, DeeperImpact, Tilde
  • Metrics: nDCG@10, recall@1000, RBO, mean response time (MRT)
  • Baselines: Exhaustive MaxScore (safe), Clipping, Guided Traversal

Results:

  • With access budgets ~5k, SPRAWL achieves near-exhaustive nDCG@10 across all ranking functions.
  • MRT ranges between 0.5–2.5 ms—substantially faster than MaxScore.
  • Inclusion of term-pair prefixes yields a noticeable boost over single-term-only strategies.

📝 Citation

If you find this work useful, please cite:

@inproceedings{Gou2025SIGIR,
  title = {Fast and Effective Early Termination for Simple Ranking Functions},
  author = {Gou, Jinrui and Mallia, Antonio and Liu, Yifan and Shao, Minghao and Suel, Torsten},
  booktitle = {Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25)},
  year = {2025},
}

About

Fast and Effective Early Termination for Simple Ranking Functions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published