🚀 Official Implementation for ICLR'2025 Paper

🌍 [Website] | 📖 [Preprint] | 📚 [Embeddings (To be released soon)] | ⚙️ [Infrastructure]

We have a series of work focusing on embedding-based reward models in RLHF:

Part I. Reward Model Foundation (This paper: foundation of preference-based reward modeling and embedding-based reward models)
Part II. Active Reward Modeling ([preprint], [repo])
Part III. Accelerating Reward Model Research with our Infra. ([preprint], [repo])
Part IV. Human Preference Learning through Principal Component Analysis ([preprint])

⚙️ Infra for Easy-Reproducible Reward Model Research

The reproduction for reward modeling research has long been a challenge, given its high demand for hardware and cost in training, evaluation, and inference. We propose to conduct easy-reproducible reward model research on the embedding space.

This paper posits details of the workflow: [Part III. TO BE RELEASED SOON.]. Our motivation is to make it possible for every researcher with a single CPU to conduct reward modeling (and RLHF) research.

🔁 Reproducing the Results without GPUs

Part 1: Reproducing Data: this step is optional, and computationally expensive.

Step 1 (optional, GPU required): SFT (you need to update the PATH to the models/open-sourced datasets. You may need to apply for licences to use those models/datasets first.) Note that

python3 step1_sft.py --model_name gemma2b --dataset hh-rlhf-helpful-gpt4

Step 2 (optional, GPU required): Generate samples on training (10 per prompt) and testing prompts (500 per prompt)

python3 step2_gen_sample.py --model_name gemma2b --adapter_name sft --dataset hh-rlhf-helpful-gpt4 --eval_dataset hh-rlhf-helpful --data_class train --n_samples 10 --max_len 128
python3 step2_gen_sample.py --model_name gemma2b --adapter_name sft --dataset hh-rlhf-helpful-gpt4 --eval_dataset hh-rlhf-helpful --data_class test --n_samples 500 --max_len 128

Step 3 (optional, GPU required): annotating response qualities using golden reward models

python3 step3_reward_annotation.py --adapter_name sft --model_name gemma2b --dataset hh-rlhf-helpful-gpt4 --data_class train --n_samples 10
python3 step3.5_processing_data.py # pre-process and re-org the dataset

Step 4 (optional, GPU required): Generate and store embeddings of all prompt-response pairs

python3 step4_gen_embeddings.py --model_name gemma2b --dataset hh-rlhf-helpful-gpt4 --gen_pref_model_name gemma2b --train_test train --n_samples 10

The above 4 steps enable us to create an embedding-based dataset, then we can easily reproduce any research with such a dataset

To illustrate:

Part 2: Reproducing Results: this step is computationally efficient, and the reproduction can be done with a CPU-only machine.

Step 5 (reproduction: reward model training with CPUs)

python3 step5_train_rms.py --embed_model_name gemma2b --task helpful --sft_obj gpt4 --gen_pref_model_name gemma2b --rm_objective clf --consider_first_n 2 --annotation_quality 10
# task: helpful / harmless: to specify task
# sft_obj: gpt4 / none: to load data generated by different model checkpoints (fine-tuned or not fine-tuned models)
# gen_pref_model_name: gemma2b / gemma7b / llama38b: to experiment with different models
# rm_objective: clf / bt: for classification models and Bradley-Terry models
# consider_first_n: 2 / -1 / -2: to specify the comparison format, using 2 means randomly selecting 2 out of the 10 generated responses; using -2 means experiment with comparisons that lack diversity; using -1 means experiment with comparisons that have high diversity.
# annotation_quality: 10 means high annotation quality (error rate lower than 5 percent), 1.0, 0.5, 0.1 leads to low-quality annotations that has higher error rates up to 40 percent.

Step 6 (reproduction: reward model evaluation with CPUs)

python3 step6_eval_rms.py --embed_model_name gemma2b --task helpful --sft_obj gpt4 --gen_pref_model_name gemma2b --rm_objective clf --consider_first_n 2 --annotation_quality 10

Call for Contribution to the Infra (an Embedding-based Dataset for Reward Modeling Research)

Call for contributors! --- Please contact me at sunhopht@gmail.com if your are interested in contributing your embedding / golden-reward annotations in your reward model research to the open-source RM community!

📚 BibTex Citation

If you would like to cite our code or paper, please use

@inproceedings{
  sun2025rethinking,
  title={Rethinking Reward Modeling in Preference-based Large Language Model Alignment},
  author={Hao Sun and Yunyi Shen and Jean-Francois Ton},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=rfdblE10qm}
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
img		img
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
networks.py		networks.py
step1_sft.py		step1_sft.py
step2_gen_sample.py		step2_gen_sample.py
step3.5_processing_data.py		step3.5_processing_data.py
step3_reward_annotation.py		step3_reward_annotation.py
step4_gen_embeddings.py		step4_gen_embeddings.py
step5_train_rms.py		step5_train_rms.py
step6_eval_rms.py		step6_eval_rms.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Official Implementation for ICLR'2025 Paper

🌍 [Website] | 📖 [Preprint] | 📚 [Embeddings (To be released soon)] | ⚙️ [Infrastructure]

⚙️ Infra for Easy-Reproducible Reward Model Research

🔁 Reproducing the Results without GPUs

Part 1: Reproducing Data: this step is optional, and computationally expensive.

The above 4 steps enable us to create an embedding-based dataset, then we can easily reproduce any research with such a dataset

Part 2: Reproducing Results: this step is computationally efficient, and the reproduction can be done with a CPU-only machine.

Call for Contribution to the Infra (an Embedding-based Dataset for Reward Modeling Research)

📚 BibTex Citation

About

Releases

Packages

Contributors 2

Languages

License

holarissun/RewardModelingBeyondBradleyTerry

Folders and files

Latest commit

History

Repository files navigation

🚀 Official Implementation for ICLR'2025 Paper

🌍 [Website] | 📖 [Preprint] | 📚 [Embeddings (To be released soon)] | ⚙️ [Infrastructure]

⚙️ Infra for Easy-Reproducible Reward Model Research

🔁 Reproducing the Results without GPUs

Part 1: Reproducing Data: this step is optional, and computationally expensive.

The above 4 steps enable us to create an embedding-based dataset, then we can easily reproduce any research with such a dataset

Part 2: Reproducing Results: this step is computationally efficient, and the reproduction can be done with a CPU-only machine.

Call for Contribution to the Infra (an Embedding-based Dataset for Reward Modeling Research)

📚 BibTex Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages