- [2025.05.15]: 🔥We showcase image examples from three tasks and their training evolution at https://gongyeliu.github.io/Flow-GRPO. Check them out!
- [2025.05.13]: 🔥We now provide an online demo for all three tasks at https://huggingface.co/spaces/jieliu/SD3.5-M-Flow-GRPO. You're welcome to try it out!
Task | Model |
---|---|
GenEval | 🤗GenEval |
Text Rendering | 🤗Text |
Human Preference Alignment | 🤗PickScore |
Clone this repository and install packages.
git clone https://github.com/yifan123/flow_grpo.git
cd flow_grpo
conda create -n flow_grpo python=3.10.16
pip install -e .
The steps above only install the current repository. Since each reward model may rely on different versions, combining them in one Conda environment can cause version conflicts. To avoid this, we adopt a remote server setup inspired by ddpo-pytorch. You only need to install the specific reward model you plan to use.
Please create a new Conda virtual environment and install the corresponding dependencies according to the instructions in reward-server.
Please install paddle-ocr:
pip install paddlepaddle-gpu==2.6.2
pip install paddleocr==2.9.1
pip install python-Levenshtein
Then, pre-download the model using the Python command line:
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=False, lang="en", use_gpu=False, show_log=False)
PickScore requires no additional installation.
Please create a new Conda virtual environment and install the corresponding dependencies according to the instructions in reward-server.
We use sglang to deploy the reward service, and we also recommend using sglang or vllm for deploying VLM-based reward models. After installing sglang, please run the following command to launch UnifiedReward:
python -m sglang.launch_server --model-path CodeGoat24/UnifiedReward-7b-v1.5 --api-key flowgrpo --port 17140 --chat-template chatml-llava --enable-p2p-check --mem-fraction-static 0.85
Single-node training:
bash scripts/single_node/main.sh
Multi-node training:
# Master node
bash scripts/multi_node/main.sh
# Other nodes
bash scripts/multi_node/main1.sh
bash scripts/multi_node/main2.sh
For multi-reward settings, you can pass in a dictionary where each key is a reward name and the corresponding value is its weight. For example:
{
"pickscore": 0.5,
"ocr": 0.2,
"aesthetic": 0.3
}
This means the final reward is a weighted sum of the individual rewards.
The following reward models are currently supported:
- Geneval evaluates T2I models on complex compositional prompts.
- OCR provides an OCR-based reward.
- PickScore is a general-purpose T2I reward model trained on human preferences.
- DeQA is a multimodal LLM-based image quality assessment model that measures the impact of distortions and texture damage on perceived quality.
- ImageReward is a general-purpose T2I reward model capturing text-image alignment, visual fidelity, and safety.
- QwenVL is an experimental reward model using prompt engineering.
- Aesthetic is a CLIP-based linear regressor predicting image aesthetic scores.
- JPEG_Compressibility measures image size as a proxy for quality.
- UnifiedReward is a state-of-the-art reward model for multimodal understanding and generation, topping the human preference leaderboard.
You can adjust the parameters in config/dgx.py
to tune different hyperparameters. An empirical finding is that config.sample.train_batch_size * num_gpu / config.sample.num_image_per_prompt * config.sample.num_batches_per_epoch = 48
, i.e., group_number=48
, group_size=24
.
Additionally, setting config.train.gradient_accumulation_steps = config.sample.num_batches_per_epoch // 2
also yields good performance.
This repo is based on ddpo-pytorch and diffusers. We thank the authors for their valuable contributions to the AIGC community. Special thanks to Kevin Black for the excellent ddpo-pytorch repo.
@misc{liu2025flowgrpo,
title={Flow-GRPO: Training Flow Matching Models via Online RL},
author={Jie Liu and Gongye Liu and Jiajun Liang and Yangguang Li and Jiaheng Liu and Xintao Wang and Pengfei Wan and Di Zhang and Wanli Ouyang},
year={2025},
eprint={2505.05470},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.05470},
}