VerIF: Verification Engineering for RL in Instruction Following

Introduction

VerIF is a practical and efficient method for verification in instruction-following reinforcement learning. Built on the idea of Reinforcement Learning with Verifiable Rewards (RLVR), VerIF integrates rule-based code checks with LLM-based reasoning verification (e.g., QwQ-32B) to provide accurate and scalable reward signals.

To support this method, we construct a high-quality dataset, VerInstruct, with ~22,000 instruction-following instances paired with verification signals. Models trained with VerIF not only achieve state-of-the-art performance on several benchmarks across models at similar scale but also maintain their general capabilities.

🔥 Results

RL with VerIF significantly improves instruction-following performance across benchmarks.

Method

VerIF integrates rule-based code checks with LLM-based reasoning verification (e.g., QwQ-32B) to provide accurate and scalable reward signals.

Data & Trained Models

VerInstruct (24k instruction-following examples with verifiable signals)
TULU3-VerIF, based on Llama-3.1-Tulu-3-8B-SFT
R1-Distill-Qwen-7B-VerIF, based on DeepSeek-R1-R1-Distill-Qwen-7B

Training Guide

This repo is forked from verl. We sincerely thank the authors for their excellent framework. We introduce two key adjustments:

Efficient Local Reward Server:
We provide a local_server version of the reward function for better efficiency. We recommend running it inside a sandboxed Docker environment to avoid potential security issues. You may also deploy your own remote server.
Batch Reward Collection:
We modified ./verl/workers/reward_manager/naive.py to support batched reward calculation, which is more efficient than the original loop-based implementation. We do not modify other parts of the repo.

Quick Start (RL Using VerIF)

Please refer to the original verl documentation for environment setup.

Step 1: Preprocess Data

Download data from here. Use ./examples/data_preprocess/if_prompts.py to preprocess VerInstruct.

Make sure to add the import path for ./verl/utils/reward_score/local_server at the top of each function.

Step 2: Setup the Verifier Model

For soft constraint verification, use an LLM-based verifier. You may:

Use our own trained verifier based on R1-Distilled-Qwen-7B
Use QwQ-32B as the verifier

We suggest using SGLang or vLLM for deployment.
Then modify ./verl/utils/reward_score/local_server/llm_call.py with your API endpoint and model name.

Step 3: Start Training

Use the provided training scripts:

./examples/grpo_trainer/run_qwen2-7b_verif.sh
./examples/grpo_trainer/run_tulu3-8b_verif.sh

These use DeepSeek-RL-Distilled-Qwen-7B and TULU 3 SFT as base models.
Update paths to point to your model checkpoint if needed.

Acknowledgments

We thank the verl team for their open-source framework, and the Crab team for their open-sourced original data.

Citations

If this repo helps, please kindly cite us:

@misc{peng2025verif,
      title={VerIF: Verification Engineering for Reinforcement Learning in Instruction Following}, 
      author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li},
      year={2025},
      eprint={2506.09942},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.09942}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
assets		assets
docker		docker
docs		docs
evaluation		evaluation
examples		examples
patches		patches
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.style.yapf		.style.yapf
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VerIF: Verification Engineering for RL in Instruction Following

Introduction

🔥 Results

Method

Data & Trained Models

Training Guide

Quick Start (RL Using VerIF)

Step 1: Preprocess Data

Step 2: Setup the Verifier Model

Step 3: Start Training

Acknowledgments

Citations

About

Uh oh!

Releases

Packages

Languages

License

THU-KEG/VerIF

Folders and files

Latest commit

History

Repository files navigation

VerIF: Verification Engineering for RL in Instruction Following

Introduction

🔥 Results

Method

Data & Trained Models

Training Guide

Quick Start (RL Using VerIF)

Step 1: Preprocess Data

Step 2: Setup the Verifier Model

Step 3: Start Training

Acknowledgments

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages