This work has been accepted to ACL 2025 Main Conference.
Related Material: Read the paper on arXiv
Retrieval-Augmented Generation (RAG) systems rely heavily on rerankers to identify relevant documents. However, fine-tuning rerankers is challenging due to the limited availability of annotated query-document pairs. Existing distillation-based methods often suffer from training-inference misalignment and overlook interdependencies among candidate documents.
To address these issues, we reformulate reranking as a stochastic attention-mask learning problem and propose Gumbel Reranking, an end-to-end differentiable training framework. This method leverages the Gumbel Trick and Relaxed Top-k Sampling to learn document-wise Top-k attention masks, allowing reranker optimization to be directly supervised by the language model loss.
Please refer to environment.yaml
for dependency and environment configuration.
- For NQ and TQA, we use the same datasets as Fusion-in-Decoder. You can download the raw data via the script:
data/get_data.sh
. - For HotpotQA, MuSiQue, and 2WikiHop, please download the raw data directly from their respective official websites.
Please preprocess your dataset into the following JSON format:
{
"id": "0",
"question": "What element did Marie Curie name after her native land?",
"target": "Polonium",
"answers": ["Polonium", "Po (chemical element)", "Po"],
"ctxs": [
{
"title": "Marie Curie",
"text": "them on visits to Poland. She named the first chemical element that she discovered in 1898 \"polonium\", after her native country..."
},
{
"title": "Marie Curie",
"text": "...they announced the existence of an element which they named \"polonium\", in honour of her native Poland..."
}
]
}
We also provide preprocessed datasets ready for use: syhuang/gumbel-reranking-data
.
Gumbel Reranking is designed for fine-tuning existing rerankers. While it can be trained from scratch, using pretrained rerankers yields better results.
- For RankT5, use:
Soyoung97/RankT5-base
- For BGE-Reranker, use:
BAAI/bge-reranker-base
A strong reader is required to provide supervision signals for reranker training. We assume the reader has already been fine-tuned for the specific task.
- For NQ and TQA, you can directly use pretrained checkpoints provided in the Fusion-in-Decoder repo. See:
readers/get_model.sh
- For HotpotQA, MuSiQue, and 2WikiHop, you need to fine-tune the FiD reader using the official repo and corresponding preprocessed data.
We also release fine-tuned FiD checkpoints on Hugging Face for convenience:
Dataset | FiD-base Checkpoint | FiD-large Checkpoint |
---|---|---|
HotpotQA | syhuang/hopo_reader_base |
syhuang/hopo_reader_large |
MuSiQue | syhuang/musique_reader_base |
syhuang/musique_reader_large |
2WikiHop | syhuang/2wiki_reader_base |
syhuang/2wiki_reader_large |
In addition to using the FiD checkpoints, please make sure to load the original T5 configuration. Specifically, use google-t5/t5-base
for FiD-base and google-t5/t5-large
for FiD-large. See the base_model_path
variable in run.slurm
for reference.
We provide a SLURM script run.slurm
as a general-purpose training launcher. Please edit the script to configure your dataset paths, model names, and training hyperparameters before execution.
sbatch run.slurm
If you can SSH into the server and run bash scripts directly, you can simply execute:
bash run.slurm
Please refer to run.slurm
for more details on training paths and hyperparameter settings.