Skip to content

We introduce the direct document relevance optimization (DDRO) for training a pairwise ranker model. DDRO encourages the model to focus on document-level relevance during generation

License

Notifications You must be signed in to change notification settings

kidist-amde/ddro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

87 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DDRO: Direct Document Relevance Optimization for Generative Information Retrieval

Paper License HuggingFace

This repository contains the official implementation of our SIGIR 2025 paper:
πŸ“„ Lightweight and Direct Document Relevance Optimization for Generative IR (DDRO)

  • Optimizing Generative Retrieval with Ranking-Aligned Objectives

🚧 Repository Under Development

This repository is actively under development. Thanks for your patience, changes and improvements may be applied frequently. Stay tuned for updates!


πŸ“‘ Table of Contents

Motivation

Misalignment in Learning Objectives:
Gen-IR models are typically trained via next-token prediction (cross-entropy loss) over docid tokens.
While effective for language modeling, this objective:

  • 🎯 Optimizes token-level generation
  • ❌ Not designed for document-level ranking

As a result, Gen-IR models are not directly optimized for learning-to-rank, which is the core requirement in IR systems.

What DDRO Does

In this work, we ask:

How can Gen-IR models directly learn to rank documents, instead of just predicting the next token?

We propose DDRO:
Lightweight and Direct Document Relevance Optimization for Gen-IR

βœ… Key Contributions:

  • Aligns training objective with ranking by using pairwise preference learning
  • Trains the model to prefer relevant documents over non-relevant ones
  • Bridges the gap between autoregressive training and ranking-based optimization
  • Requires no reinforcement learning or reward modeling

DDRO training pipeline overview

Learning Objectives in DDRO

We optimize DDRO in two phases:


πŸ“˜ Phase 1: Supervised Fine-Tuning (SFT)

Learn to generate the correct docid sequence given a query by minimizing the autoregressive token-level cross-entropy loss:

  • DDRO Image

Maximize the likelihood of generating the correct docid given a query:

  • DDRO Image

πŸ“— Phase 2: Pairwise Ranking Optimization (DDRO Loss)

This phase improves the ranking quality of generated document identifiers by applying a pairwise learning-to-rank objective inspired by Direct Preference Optimization (DPO).

πŸ“„ Rafailov et al., 2023 β€” Direct Preference Optimization: Your Language Model is Secretly a Reward Model

  • DDRO Image

πŸ“– Description

This Direct Document Relevance Optimization (DDRO) loss guides the model to prefer relevant documents (docid⁺) over non-relevant ones (docid⁻) by comparing how both the current model and a frozen reference model score each document:

  • docid⁺: A relevant document for the query q
  • docid⁻: A non-relevant or less relevant document
  • $\pi_\theta$: The current model being optimized
  • $\pi^{\text{ref}}$: A frozen reference model (typically trained with SFT in Phase 1)
  • Ξ²: Temperature-like factor controlling sensitivity.
  • $\sigma$: Sigmoid function, to map scores to [0,1] preference space

Encourage the model to rank relevant docid⁺ higher than non-relevant docid⁻:

  • DDRO Image

βœ… Usage

The DPO loss is used after the SFT phase to fine-tune the ranking behavior of the model. Instead of just generating docid, the model now learns to rank docid⁺ higher than docid⁻ in a relevance/preference-aligned manner.


βœ… Why It Works

  • Directly encourages higher generation scores for relevant documents
  • Uses contrastive ranking rather than token-level generation
  • Avoids reward modeling or RL while remaining efficient and scalable

πŸ’‘ Why DDRO is Different from Standard DPO

While our optimization is inspired by the DPO framework Rafailov et al., 2023, its adaptation to Generative Document Retrieval is non-trivial:

  • In contrast to open-ended preference alignment, our task involves structured docid generation under beam decoding constraints
  • Our model uses an encoder-decoder architecture rather than decoder-only
  • The objective is document-level ranking, not open-ended preference generation

This required novel integration of preference optimization into retrieval-specific pipelines, making DDRO uniquely suited for GenIR.

πŸ“ Project Structure

src/
β”œβ”€β”€ data/                # Data downloading, preprocessing, and docid instance generation
β”œβ”€β”€ pretrain/            # DDRO model training and evaluation logic (incl. ddro)
β”œβ”€β”€ scripts/             # Entry-point shell scripts for SFT, ddro, BM25, and preprocessing
β”œβ”€β”€ utils/               # Core utilities (tokenization, trie, metrics, trainers)
β”œβ”€β”€ ddro.yml             # Conda environment (for training DDRO)
β”œβ”€β”€ pyserini.yml         # Conda environment (for BM25 retrieval with Pyserini)
β”œβ”€β”€ README.md            # You're here!
└── requirements.txt     # Additional Python dependencies

πŸ“Œ Important

πŸ”Ž Each subdirectory includes a detailed README.md with instructions.


πŸ› οΈ Setup & Dependencies

1. Install Environment

Clone the repository and create the conda environment:

git clone https://github.com/kidist-amde/ddro.git
cd ddro
conda env create -f ddro_env.yml
conda activate ddro_env

2. Download Datasets and Pretrained Model

We use MS MARCO document (top-300k) and Natural Questions (NQ-320k) datasets, and a pretrained T5 model.

To download them, run the following commands from the project root (ddro/):

bash   ./src/data/download/download_msmarco_datasets.sh
bash   ./src/data/download/download_nq_datasets.sh
python ./src/data/download/download_t5_model.py

πŸ“‚ For details and download links, refer to: src/data/download/README.md

3. Data Preparation

DDRO evaluated both on Natural Questions (NQ) and MS MARCO datasets.

βœ… Sample Top-300K MS MARCO Subset Run the following script to preprocess and extract the top-300K most relevant MS MARCO documents based on qrels:

bash scripts/preprocess/sample_top_docs.sh
  • πŸ“Œ This will generate: resources/datasets/processed/msmarco-docs-sents.top.300k.json.gz (sentence-tokenized JSONL format, ranked by relevance frequency)

Expected Directory Structure

Once everything is downloaded and processed, your resources/ directory should look like this:

resources/
β”œβ”€β”€ datasets/
β”‚   β”œβ”€β”€ raw/
β”‚   β”‚   β”œβ”€β”€ msmarco-data/         # Raw MS MARCO dataset 
β”‚   β”‚   └── nq-data/              # Raw Natural Questions dataset
β”‚   └── processed/                # Preprocessed outputs
└── transformer_models/
      └── t5-base/                # Local copy of T5 model & tokenizer

πŸ“Œ Important

πŸ”Ž To process and sample both datasets, generate document IDs, and prepare training/evaluation instances, please refer to the corresponding README:

πŸ”— src/data/dataprep/README.md


Training Pipeline

πŸ“˜ Phase 1: Supervised Fine-Tuning (SFT)

We first train a Supervised Fine-Tuning (SFT) model using next-token prediction across three stages:

  1. Pretraining on document content (doc β†’ docid)
  2. Search Pretraining on pseudo queries (pseudoquery β†’ docid)
  3. Finetuning on real queries using supervised pairs from qrels (with gold docids) (query β†’ docid)

This results in a seed model trained to autoregressively generate document identifiers.

You can run all stages with a single command:

bash ddro/src/scripts/sft/launch_SFT_training.sh

πŸ“ The --encoding flag in the script supports id formats like pq, url.

πŸ”§ Phase 2: DDRO Training (Pairwise Optimization)

After training the SFT model (Phase 1), we apply Phase 2: Direct Document Relevance Optimization, which fine-tunes the model using a pairwise ranking objective, that trains the model to prefer relevant documents over non-relevant ones.

This bridges the gap between autoregressive generation and ranking-based retrieval.

We implement this using a custom version of Hugging Face's DPOTrainer.

Run DDRO training and evaluation:

bash scripts/ddro/run_ddro_training.sh
bash scripts/ddro/run_test_ddro.sh

πŸ“‚ Evaluation logs and metrics are saved to:

logs/
outputs/

πŸ“š Datasets Used

We evaluate DDRO on two standard retrieval benchmarks:

Preprocessed Data & Model Checkpoints

All datasets, pseudo queries, docid encodings, and model checkpoints are available here:
πŸ”— DDRO Generative IR Collection on Hugging Face πŸ€—


πŸ™ Acknowledgments

We gratefully acknowledge the following open-source projects:


πŸ“„ License

This project is licensed under the Apache 2.0 License.


Citation

@article{mekonnen2025lightweight,
  title={Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval},
  author={Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten},
  journal={arXiv preprint arXiv:2504.05181},
  year={2025}
}

πŸ“¬ Contact

For questions, please open an issue.

© 2025 Kidist Amde Mekonnen · Made with ❀️ at IRLab, University of Amsterdam.


About

We introduce the direct document relevance optimization (DDRO) for training a pairwise ranker model. DDRO encourages the model to focus on document-level relevance during generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •