Skip to content

Official implementation for "Gumbel Reranking: Differentiable End-to-End Reranker Optimization" (ACL 2025 Main)

Notifications You must be signed in to change notification settings

LUMIA-Group/Gumbel-Reranking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gumbel Reranking: Differentiable End-to-End Reranker Optimization

This work has been accepted to ACL 2025 Main Conference.

Related Material: Read the paper on arXiv


Introduction

Retrieval-Augmented Generation (RAG) systems rely heavily on rerankers to identify relevant documents. However, fine-tuning rerankers is challenging due to the limited availability of annotated query-document pairs. Existing distillation-based methods often suffer from training-inference misalignment and overlook interdependencies among candidate documents.

To address these issues, we reformulate reranking as a stochastic attention-mask learning problem and propose Gumbel Reranking, an end-to-end differentiable training framework. This method leverages the Gumbel Trick and Relaxed Top-k Sampling to learn document-wise Top-k attention masks, allowing reranker optimization to be directly supervised by the language model loss.

Gumbel Reranking Architecture


Environment Setup

Please refer to environment.yaml for dependency and environment configuration.


Data Preparation

Raw Datasets

  1. For NQ and TQA, we use the same datasets as Fusion-in-Decoder. You can download the raw data via the script: data/get_data.sh.
  2. For HotpotQA, MuSiQue, and 2WikiHop, please download the raw data directly from their respective official websites.

Data Format

Please preprocess your dataset into the following JSON format:

{
  "id": "0",
  "question": "What element did Marie Curie name after her native land?",
  "target": "Polonium",
  "answers": ["Polonium", "Po (chemical element)", "Po"],
  "ctxs": [
    {
      "title": "Marie Curie",
      "text": "them on visits to Poland. She named the first chemical element that she discovered in 1898 \"polonium\", after her native country..."
    },
    {
      "title": "Marie Curie",
      "text": "...they announced the existence of an element which they named \"polonium\", in honour of her native Poland..."
    }
  ]
}

We also provide preprocessed datasets ready for use: syhuang/gumbel-reranking-data.


Model Preparation

Reranker Initialization

Gumbel Reranking is designed for fine-tuning existing rerankers. While it can be trained from scratch, using pretrained rerankers yields better results.

Reader Initialization

A strong reader is required to provide supervision signals for reranker training. We assume the reader has already been fine-tuned for the specific task.

  • For NQ and TQA, you can directly use pretrained checkpoints provided in the Fusion-in-Decoder repo. See: readers/get_model.sh
  • For HotpotQA, MuSiQue, and 2WikiHop, you need to fine-tune the FiD reader using the official repo and corresponding preprocessed data.

We also release fine-tuned FiD checkpoints on Hugging Face for convenience:

Dataset FiD-base Checkpoint FiD-large Checkpoint
HotpotQA syhuang/hopo_reader_base syhuang/hopo_reader_large
MuSiQue syhuang/musique_reader_base syhuang/musique_reader_large
2WikiHop syhuang/2wiki_reader_base syhuang/2wiki_reader_large

In addition to using the FiD checkpoints, please make sure to load the original T5 configuration. Specifically, use google-t5/t5-base for FiD-base and google-t5/t5-large for FiD-large. See the base_model_path variable in run.slurm for reference.


Training

We provide a SLURM script run.slurm as a general-purpose training launcher. Please edit the script to configure your dataset paths, model names, and training hyperparameters before execution.

sbatch run.slurm

If you can SSH into the server and run bash scripts directly, you can simply execute:

bash run.slurm

Please refer to run.slurm for more details on training paths and hyperparameter settings.

About

Official implementation for "Gumbel Reranking: Differentiable End-to-End Reranker Optimization" (ACL 2025 Main)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published