DDRO: Direct Document Relevance Optimization for Generative Information Retrieval

This repository contains the official implementation of our SIGIR 2025 paper:
📄 Lightweight and Direct Document Relevance Optimization for Generative IR (DDRO)

Optimizing Generative Retrieval with Ranking-Aligned Objectives

🚧 Repository Under Development

This repository is actively under development. Thanks for your patience, changes and improvements may be applied frequently. Stay tuned for updates!

📑 Table of Contents

Motivation
What DDRO Does
Learning Objectives
🛠️ Setup & Dependencies - Steps to Reproduce 🎯
Preprocessed Data & Model Checkpoints
Citation

Motivation

Misalignment in Learning Objectives:
Gen-IR models are typically trained via next-token prediction (cross-entropy loss) over docid tokens.
While effective for language modeling, this objective:

🎯 Optimizes token-level generation
❌ Not designed for document-level ranking

As a result, Gen-IR models are not directly optimized for learning-to-rank, which is the core requirement in IR systems.

What DDRO Does

In this work, we ask:

How can Gen-IR models directly learn to rank documents, instead of just predicting the next token?

We propose DDRO:
Lightweight and Direct Document Relevance Optimization for Gen-IR

✅ Key Contributions:

Aligns training objective with ranking by using pairwise preference learning
Trains the model to prefer relevant documents over non-relevant ones
Bridges the gap between autoregressive training and ranking-based optimization
Requires no reinforcement learning or reward modeling

Learning Objectives in DDRO

We optimize DDRO in two phases:

📘 Phase 1: Supervised Fine-Tuning (SFT)

Learn to generate the correct docid sequence given a query by minimizing the autoregressive token-level cross-entropy loss:

Maximize the likelihood of generating the correct docid given a query:

📗 Phase 2: Pairwise Ranking Optimization (DDRO Loss)

This phase improves the ranking quality of generated document identifiers by applying a pairwise learning-to-rank objective inspired by Direct Preference Optimization (DPO).

📄 Rafailov et al., 2023 — Direct Preference Optimization: Your Language Model is Secretly a Reward Model

📖 Description

This Direct Document Relevance Optimization (DDRO) loss guides the model to prefer relevant documents (docid⁺) over non-relevant ones (docid⁻) by comparing how both the current model and a frozen reference model score each document:

docid⁺: A relevant document for the query q
docid⁻: A non-relevant or less relevant document
$\pi_\theta$: The current model being optimized
$\pi^{\text{ref}}$: A frozen reference model (typically trained with SFT in Phase 1)
β: Temperature-like factor controlling sensitivity.
$\sigma$: Sigmoid function, to map scores to [0,1] preference space

Encourage the model to rank relevant docid⁺ higher than non-relevant docid⁻:

✅ Usage

The DPO loss is used after the SFT phase to fine-tune the ranking behavior of the model. Instead of just generating docid, the model now learns to rank docid⁺ higher than docid⁻ in a relevance/preference-aligned manner.

✅ Why It Works

Directly encourages higher generation scores for relevant documents
Uses contrastive ranking rather than token-level generation
Avoids reward modeling or RL while remaining efficient and scalable

💡 Why DDRO is Different from Standard DPO

While our optimization is inspired by the DPO framework Rafailov et al., 2023, its adaptation to Generative Document Retrieval is non-trivial:

In contrast to open-ended preference alignment, our task involves structured docid generation under beam decoding constraints
Our model uses an encoder-decoder architecture rather than decoder-only
The objective is document-level ranking, not open-ended preference generation

This required novel integration of preference optimization into retrieval-specific pipelines, making DDRO uniquely suited for GenIR.

📁 Project Structure

src/
├── data/                # Data downloading, preprocessing, and docid instance generation
├── pretrain/            # DDRO model training and evaluation logic (incl. ddro)
├── scripts/             # Entry-point shell scripts for SFT, ddro, BM25, and preprocessing
├── utils/               # Core utilities (tokenization, trie, metrics, trainers)
├── ddro.yml             # Conda environment (for training DDRO)
├── pyserini.yml         # Conda environment (for BM25 retrieval with Pyserini)
├── README.md            # You're here!
└── requirements.txt     # Additional Python dependencies

📌 Important

🔎 Each subdirectory includes a detailed README.md with instructions.

🛠️ Setup & Dependencies

1. Install Environment

Clone the repository and create the conda environment:

git clone https://github.com/kidist-amde/ddro.git
cd ddro
conda env create -f ddro_env.yml
conda activate ddro_env

2. Download Datasets and Pretrained Model

We use MS MARCO document (top-300k) and Natural Questions (NQ-320k) datasets, and a pretrained T5 model.

To download them, run the following commands from the project root (ddro/):

bash   ./src/data/download/download_msmarco_datasets.sh
bash   ./src/data/download/download_nq_datasets.sh
python ./src/data/download/download_t5_model.py

📂 For details and download links, refer to: src/data/download/README.md

3. Data Preparation

DDRO evaluated both on Natural Questions (NQ) and MS MARCO datasets.

✅ Sample Top-300K MS MARCO Subset Run the following script to preprocess and extract the top-300K most relevant MS MARCO documents based on qrels:

bash scripts/preprocess/sample_top_docs.sh

📌 This will generate: resources/datasets/processed/msmarco-docs-sents.top.300k.json.gz (sentence-tokenized JSONL format, ranked by relevance frequency)

Expected Directory Structure

Once everything is downloaded and processed, your resources/ directory should look like this:

resources/
├── datasets/
│   ├── raw/
│   │   ├── msmarco-data/         # Raw MS MARCO dataset 
│   │   └── nq-data/              # Raw Natural Questions dataset
│   └── processed/                # Preprocessed outputs
└── transformer_models/
      └── t5-base/                # Local copy of T5 model & tokenizer

📌 Important

🔎 To process and sample both datasets, generate document IDs, and prepare training/evaluation instances, please refer to the corresponding README:

🔗 src/data/dataprep/README.md

Training Pipeline

📘 Phase 1: Supervised Fine-Tuning (SFT)

We first train a Supervised Fine-Tuning (SFT) model using next-token prediction across three stages:

Pretraining on document content (doc → docid)
Search Pretraining on pseudo queries (pseudoquery → docid)
Finetuning on real queries using supervised pairs from qrels (with gold docids) (query → docid)

This results in a seed model trained to autoregressively generate document identifiers.

You can run all stages with a single command:

bash ddro/src/scripts/sft/launch_SFT_training.sh

📍 The --encoding flag in the script supports id formats like pq, url.

🔧 Phase 2: DDRO Training (Pairwise Optimization)

After training the SFT model (Phase 1), we apply Phase 2: Direct Document Relevance Optimization, which fine-tunes the model using a pairwise ranking objective, that trains the model to prefer relevant documents over non-relevant ones.

This bridges the gap between autoregressive generation and ranking-based retrieval.

We implement this using a custom version of Hugging Face's DPOTrainer.

Run DDRO training and evaluation:

bash scripts/ddro/run_ddro_training.sh
bash scripts/ddro/run_test_ddro.sh

📂 Evaluation logs and metrics are saved to:

logs/
outputs/

📚 Datasets Used

We evaluate DDRO on two standard retrieval benchmarks:

📘 MS MARCO Document Ranking
📗 Natural Questions (NQ)

Preprocessed Data & Model Checkpoints

All datasets, pseudo queries, docid encodings, and model checkpoints are available here:
🔗 DDRO Generative IR Collection on Hugging Face 🤗

🙏 Acknowledgments

We gratefully acknowledge the following open-source projects:

📄 License

This project is licensed under the Apache 2.0 License.

Citation

@article{mekonnen2025lightweight,
  title={Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval},
  author={Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten},
  journal={arXiv preprint arXiv:2504.05181},
  year={2025}
}

📬 Contact

For questions, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ddro_env.yml		ddro_env.yml
pyserini.yml		pyserini.yml
requirements.txt		requirements.txt

License

kidist-amde/ddro

Folders and files

Latest commit

History

Repository files navigation

DDRO: Direct Document Relevance Optimization for Generative Information Retrieval

🚧 Repository Under Development

📑 Table of Contents

Motivation

What DDRO Does

✅ Key Contributions:

Learning Objectives in DDRO

📘 Phase 1: Supervised Fine-Tuning (SFT)

📗 Phase 2: Pairwise Ranking Optimization (DDRO Loss)

📖 Description

✅ Usage

✅ Why It Works

💡 Why DDRO is Different from Standard DPO

📁 Project Structure

📌 Important

🛠️ Setup & Dependencies

1. Install Environment

2. Download Datasets and Pretrained Model

3. Data Preparation

Expected Directory Structure

📌 Important

Training Pipeline

📘 Phase 1: Supervised Fine-Tuning (SFT)

📍 The --encoding flag in the script supports id formats like pq, url.

🔧 Phase 2: DDRO Training (Pairwise Optimization)

📚 Datasets Used

Preprocessed Data & Model Checkpoints

🙏 Acknowledgments

📄 License

Citation

📬 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages