This repository contains the official implementation of our SIGIR 2025 paper:
π Lightweight and Direct Document Relevance Optimization for Generative IR (DDRO)
- Optimizing Generative Retrieval with Ranking-Aligned Objectives
This repository is actively under development. Thanks for your patience, changes and improvements may be applied frequently. Stay tuned for updates!
- Motivation
- What DDRO Does
- Learning Objectives
- π οΈ Setup & Dependencies - Steps to Reproduce π―
- Preprocessed Data & Model Checkpoints
- Citation
Misalignment in Learning Objectives:
Gen-IR models are typically trained via next-token prediction (cross-entropy loss) over docid tokens.
While effective for language modeling, this objective:
- π― Optimizes token-level generation
- β Not designed for document-level ranking
As a result, Gen-IR models are not directly optimized for learning-to-rank, which is the core requirement in IR systems.
In this work, we ask:
How can Gen-IR models directly learn to rank documents, instead of just predicting the next token?
We propose DDRO:
Lightweight and Direct Document Relevance Optimization for Gen-IR
- Aligns training objective with ranking by using pairwise preference learning
- Trains the model to prefer relevant documents over non-relevant ones
- Bridges the gap between autoregressive training and ranking-based optimization
- Requires no reinforcement learning or reward modeling
We optimize DDRO in two phases:
Learn to generate the correct docid sequence given a query by minimizing the autoregressive token-level cross-entropy loss:
Maximize the likelihood of generating the correct docid given a query:
This phase improves the ranking quality of generated document identifiers by applying a pairwise learning-to-rank objective inspired by Direct Preference Optimization (DPO).
π Rafailov et al., 2023 β Direct Preference Optimization: Your Language Model is Secretly a Reward Model
This Direct Document Relevance Optimization (DDRO) loss guides the model to prefer relevant documents (docidβΊ
) over non-relevant ones (docidβ»
) by comparing how both the current model and a frozen reference model score each document:
-
docidβΊ
: A relevant document for the queryq
-
docidβ»
: A non-relevant or less relevant document -
$\pi_\theta$ : The current model being optimized -
$\pi^{\text{ref}}$ : A frozen reference model (typically trained with SFT in Phase 1) - Ξ²: Temperature-like factor controlling sensitivity.
-
$\sigma$ : Sigmoid function, to map scores to [0,1] preference space
Encourage the model to rank relevant docidβΊ higher than non-relevant docidβ»:
The DPO loss is used after the SFT phase to fine-tune the ranking behavior of the model. Instead of just generating docid
, the model now learns to rank docidβΊ
higher than docidβ»
in a relevance/preference-aligned manner.
- Directly encourages higher generation scores for relevant documents
- Uses contrastive ranking rather than token-level generation
- Avoids reward modeling or RL while remaining efficient and scalable
While our optimization is inspired by the DPO framework Rafailov et al., 2023, its adaptation to Generative Document Retrieval is non-trivial:
- In contrast to open-ended preference alignment, our task involves structured docid generation under beam decoding constraints
- Our model uses an encoder-decoder architecture rather than decoder-only
- The objective is document-level ranking, not open-ended preference generation
This required novel integration of preference optimization into retrieval-specific pipelines, making DDRO uniquely suited for GenIR.
src/
βββ data/ # Data downloading, preprocessing, and docid instance generation
βββ pretrain/ # DDRO model training and evaluation logic (incl. ddro)
βββ scripts/ # Entry-point shell scripts for SFT, ddro, BM25, and preprocessing
βββ utils/ # Core utilities (tokenization, trie, metrics, trainers)
βββ ddro.yml # Conda environment (for training DDRO)
βββ pyserini.yml # Conda environment (for BM25 retrieval with Pyserini)
βββ README.md # You're here!
βββ requirements.txt # Additional Python dependencies
π Each subdirectory includes a detailed
README.md
with instructions.
Clone the repository and create the conda environment:
git clone https://github.com/kidist-amde/ddro.git
cd ddro
conda env create -f ddro_env.yml
conda activate ddro_env
We use MS MARCO document (top-300k) and Natural Questions (NQ-320k) datasets, and a pretrained T5 model.
To download them, run the following commands from the project root (ddro/):
bash ./src/data/download/download_msmarco_datasets.sh
bash ./src/data/download/download_nq_datasets.sh
python ./src/data/download/download_t5_model.py
π For details and download links, refer to: src/data/download/README.md
DDRO evaluated both on Natural Questions (NQ) and MS MARCO datasets.
β Sample Top-300K MS MARCO Subset Run the following script to preprocess and extract the top-300K most relevant MS MARCO documents based on qrels:
bash scripts/preprocess/sample_top_docs.sh
- π This will generate: resources/datasets/processed/msmarco-docs-sents.top.300k.json.gz (sentence-tokenized JSONL format, ranked by relevance frequency)
Once everything is downloaded and processed, your resources/ directory should look like this:
resources/
βββ datasets/
β βββ raw/
β β βββ msmarco-data/ # Raw MS MARCO dataset
β β βββ nq-data/ # Raw Natural Questions dataset
β βββ processed/ # Preprocessed outputs
βββ transformer_models/
βββ t5-base/ # Local copy of T5 model & tokenizer
π To process and sample both datasets, generate document IDs, and prepare training/evaluation instances, please refer to the corresponding README:
We first train a Supervised Fine-Tuning (SFT) model using next-token prediction across three stages:
- Pretraining on document content (
doc β docid
) - Search Pretraining on pseudo queries (
pseudoquery β docid
) - Finetuning on real queries using supervised pairs from qrels (with gold docids) (
query β docid
)
This results in a seed model trained to autoregressively generate document identifiers.
You can run all stages with a single command:
bash ddro/src/scripts/sft/launch_SFT_training.sh
After training the SFT model (Phase 1), we apply Phase 2: Direct Document Relevance Optimization, which fine-tunes the model using a pairwise ranking objective, that trains the model to prefer relevant documents over non-relevant ones.
This bridges the gap between autoregressive generation and ranking-based retrieval.
We implement this using a custom version of Hugging Face's DPOTrainer
.
Run DDRO training and evaluation:
bash scripts/ddro/run_ddro_training.sh
bash scripts/ddro/run_test_ddro.sh
π Evaluation logs and metrics are saved to:
logs/
outputs/
We evaluate DDRO on two standard retrieval benchmarks:
All datasets, pseudo queries, docid encodings, and model checkpoints are available here:
π DDRO Generative IR Collection on Hugging Face π€
We gratefully acknowledge the following open-source projects:
This project is licensed under the Apache 2.0 License.
@article{mekonnen2025lightweight,
title={Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval},
author={Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten},
journal={arXiv preprint arXiv:2504.05181},
year={2025}
}
For questions, please open an issue.
Β© 2025 Kidist Amde Mekonnen Β· Made with β€οΈ at IRLab, University of Amsterdam.