GitHub

DanceGRPO

DanceGRPO is the first unified RL-based framework for visual generation.

This is the official implementation for paper, DanceGRPO: Unleashing GRPO on Visual Generation. We develop DanceGRPO based on FastVideo, a scalable and efficient framework for video and image generation.

Key Features

DanceGRPO has the following features:

Support Stable Diffusion
Support FLUX
Support HunyuanVideo (todo)

Getting Started

Downloading checkpoints

You should use "mkdir" for these folders first.

Download the Stable Diffusion v1.4 checkpoints from here to "./data/stable-diffusion-v1-4".
Download the FLUX checkpoints from here to "./data/flux".
Download the HPS-v2.1 checkpoint (HPS_v2.1_compressed.pt) from here to "./hps_ckpt".
Download the CLIP H-14 checkpoint (open_clip_pytorch_model.bin) from here to "./hps_ckpt".

Installation

./env_setup.sh fastvideo

Training

# for Stable Diffusion, with 8 H800s
bash scripts/finetune/finetune_sd_grpo.sh

# for FLUX, preprocessing with 8 H800s
bash scripts/preprocess/preprocess_flux_rl_embeddings.sh
# for FLUX, training with 16 H800s
bash scripts/finetune/finetune_flux_grpo.sh

For open-source version, we use the prompts in HPD dataset for training, as shown in "./prompts.txt".

Rewards

We give the (moving average) reward curves of Stable Diffusion (left or upper) and FLUX (right or lower). We can complete the FLUX training (200 iterations) within 12 hours with 16 H800s.

We provide more visualization examples (base, 80 iters rlhf, 160 iters rlhf) in "./assets/flux_visualization". The visualization scripts can be found in "./scripts/visualization/vis_flux.py". We always use larger resolutions and more sampling steps than RLHF training for visualization, because we use lower resolutions and less sampling steps for speeding up the RLHF training

We don't recommend using 8 H800s for the FLUX training script, because we find a global prompt batch size of 8 is not enough.

More discussion on FLUX can be found in "./fastvideo/README.md".

Acknowledgement

We learned and reused code from the following projects:

Citation

If you use DanceGRPO for your research, please cite our paper:

@article{xue2025dancegrpo,
  title={DanceGRPO: Unleashing GRPO on Visual Generation},
  author={Xue, Zeyue and Wu, Jie and Gao, Yu and Kong, Fangyuan and Zhu, Lingting and Chen, Mengzhao and Liu, Zhiheng and Liu, Wei and Guo, Qiushan and Huang, Weilin and others},
  journal={arXiv preprint arXiv:2505.07818},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
demo		demo
docs		docs
fastvideo		fastvideo
scripts		scripts
tests		tests
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
dancegrpo.pdf		dancegrpo.pdf
env_setup.sh		env_setup.sh
format.sh		format.sh
predict.py		predict.py
prompts.txt		prompts.txt
pyproject.toml		pyproject.toml
requirements-lint.txt		requirements-lint.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DanceGRPO

Key Features

Getting Started

Downloading checkpoints

Installation

Training

Rewards

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

License

1854039/DanceGRPO

Folders and files

Latest commit

History

Repository files navigation

DanceGRPO

Key Features

Getting Started

Downloading checkpoints

Installation

Training

Rewards

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages