Jinrui Gou (NYU), Antonio Mallia (Pinecone), Yifan Liu (UCLA), Minghao Shao (NYU), Torsten Suel (NYU)
SIGIR ’25, Padua, Italy — July 13–18, 2025
Web search systems often rely on a fast, simple ranking stage (e.g. BM25, Query Likelihood, learned sparse models like DeepImpact or DocT5Query) to generate candidate documents before applying heavier re-rankers. This paper introduces SPRAWL (Sorted PRefix Access With Lookups), an early-termination technique tailored to such simple ranking methods.
SPRAWL leverages a two-tier inverted index:
- A prefix index for selected single terms and term-pairs, sorted by impact scores.
- A fallback full inverted index for precise scoring.
At query time, SPRAWL:
- Draws a budgeted number of top-impact postings from the prefix index.
- Aggregates partial scores per document.
- Performs limited lookups for the top candidates to complete full scores.
- Returns the final top‑K results.
This achieves near-exhaustive ranking quality (e.g. nDCG@10 close to baseline on MS MARCO v1/v2) with response times down to sub‑millisecond levels—often an order-of-magnitude faster than safe baselines like MaxScore.
- SPRAWL method: Adapts score‑at‑a‑time ideas for early termination on simple and learned sparse ranking functions.
- Prefix index with term-pairs: Introducing two-term structures significantly boosts ranking accuracy under tight access budgets.
- Efficiency without sacrifice: Achieves high nDCG@10, recall@1k, and RBO, while delivering mean response time between ~0.5–2.5 ms on MS MARCO.
- Generality: Supports BM25, BM25+DocT5Query, DeepImpact, DeeperImpact, and Tilde—covering both traditional and modern learned sparse scenarios.
-
Indexing Phase
- Build prefix indexes over high-impact single terms and term-pairs using training query logs.
- For each selected term or term-pair, store top‑p postings sorted by aggregated impact scores.
-
Query Phase
- Use budgets: access budget a_b (postings) + lookup budget l_b (document lookups).
- Step-by-step:
- Select highest-impact postings (up to a_b) across all relevant prefixes.
- Aggregate partial document scores.
- For the top‑l_b documents with partial scores, lookup missing term scores via the full index.
- Return final top‑K ranked results.
This technique simplifies prior methods (e.g., from [39]) by using hash-based accumulation and budgeted lookup, avoiding complex learned scoring.
- Datasets: MS MARCO Passage v1 (8.8M passages) & v2 (138M)
- Ranking functions: BM25, BM25+DocT5Query, DeepImpact, DeeperImpact, Tilde
- Metrics: nDCG@10, recall@1000, RBO, mean response time (MRT)
- Baselines: Exhaustive MaxScore (safe), Clipping, Guided Traversal
Results:
- With access budgets ~5k, SPRAWL achieves near-exhaustive nDCG@10 across all ranking functions.
- MRT ranges between 0.5–2.5 ms—substantially faster than MaxScore.
- Inclusion of term-pair prefixes yields a noticeable boost over single-term-only strategies.
If you find this work useful, please cite:
@inproceedings{Gou2025SIGIR,
title = {Fast and Effective Early Termination for Simple Ranking Functions},
author = {Gou, Jinrui and Mallia, Antonio and Liu, Yifan and Shao, Minghao and Suel, Torsten},
booktitle = {Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25)},
year = {2025},
}