The collection of awesome papers on the aligement of visual generation models (including AR and diffusion models)
We also use this repo as a reference.
We welcome community contributions!
- CS285 from UC Berkeley. (This course is mainly for robotics control, but very important if you want to be an expert on RL.)
- Introduction to Reinforcement Learning
- 动手学强化学习 (in Chinese)
- 强化学习知乎专栏 (in Chinese)
Traditional Policy Gradient
- DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models, [pdf]
- Training Diffusion Models with Reinforcement Learning, [pdf]
GRPO
- DanceGRPO: Unleashing GRPO on Visual Generation, [pdf]
- Flow-GRPO: Training Flow Matching Models via Online RL, [pdf]
DPO
- Diffusion Model Alignment Using Direct Preference Optimization, [pdf]
- Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model, [pdf]
Reward Feedback Learning
- ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, [pdf]
- Directly Fine-Tuning Diffusion Models on Differentiable Rewards, [pdf]
Alignment on AR models
- Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [pdf]
- Autoregressive Image Generation Guided by Chains of Thought [pdf]
Reward Models
- ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, [pdf]
- Human Preference Score: Better Aligning Text-to-Image Models with Human Preference, [pdf]
- DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models, [pdf]
- Training Diffusion Models with Reinforcement Learning, [pdf]
- Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning, [pdf]
- Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards, [pdf]
- Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation, [pdf]
- DanceGRPO: Unleashing GRPO on Visual Generation, [pdf]
- Flow-GRPO: Training Flow Matching Models via Online RL, [pdf]
- MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE, [pdf]
- TempFlow-GRPO: When Timing Matters for GRPO in Flow Models, [pdf]
- Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning, [pdf]
DPO-based (referred to here)
- Diffusion Model Alignment Using Direct Preference Optimization, [pdf]
- Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model, [pdf]
- A Dense Reward View on Aligning Text-to-Image Diffusion with Preference, [pdf]
- Aligning Diffusion Models by Optimizing Human Utility, [pdf]
- Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization, [pdf]
- Scalable Ranked Preference Optimization for Text-to-Image Generation, [pdf]
- Prioritize Denoising Steps on Diffusion Model Preference Alignment via Explicit Denoised Distribution Estimation, [pdf]
- PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation. arXiv 2024, [pdf]
- SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation, [pdf]
- Margin-aware Preference Optimization for Aligning Diffusion Models without Reference, [pdf]
- DSPO: Direct Score Preference Optimization for Diffusion Model Alignment, [pdf]
- Direct Distributional Optimization for Provable Alignment of Diffusion Models, [pdf]
- Boost Your Human Image Generation Model via Direct Preference Optimization, [pdf]
- Curriculum Direct Preference Optimization for Diffusion and Consistency Models, [pdf]
- Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization, [pdf]
- Personalized Preference Fine-tuning of Diffusion Models, [pdf]
- Calibrated Multi-Preference Optimization for Aligning Diffusion Models, [pdf]
- InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment, [pdf]
- CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation, [pdf]
- D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples, [pdf]
- Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences, [pdf]
- Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking, [pdf]
- Aligning Text to Image in Diffusion Models is Easier Than You Think, [pdf]
- OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization, [pdf]
- VideoDPO: Omni-Preference Alignment for Video Diffusion Generation, [pdf]
- DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization, [pdf]
- Flow-DPO: Improving Video Generation with Human Feedback, [pdf]
- HuViDPO: Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment, [pdf]
- Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models, [pdf]
- Fine-Tuning Diffusion Generative Models via Rich Preference Optimization, [pdf]
- ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, [pdf]
- Aligning Text-to-Image Diffusion Models with Reward Backpropagation, [pdf]
- Directly Fine-Tuning Diffusion Models on Differentiable Rewards, [pdf]
- Feedback Efficient Online Fine-Tuning of Diffusion Models, [pdf]
- Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models, [pdf]
- Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward, [pdf]
- InstructVideo: Instructing Video Diffusion Models with Human Feedback, [pdf]
- IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation, [pdf]
We only list the reports using alignment methods:
- Seedream 2.0: A native chinese-english bilingual image generation foundation model, [pdf]
- Seedream 3.0 technical report, [pdf]
- Seedance 1.0: Exploring the Boundaries of Video Generation Models, [pdf]
- Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model, [pdf]
- Qwen-Image Technical Report, [pdf]
- Skywork-UniPic2, [pdf]
Alignment on AR models (referred to here)
- Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, [pdf]
- Autoregressive Image Generation Guided by Chains of Thought, [pdf]
- LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization, [pdf]
- SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL, [pdf]
- UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation, [pdf]
- UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning, [pdf]
- ReasonGen-R1: CoT for Autoregressive Image Generation model through SFT and RL, [pdf]
- Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation, [pdf]
- Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO, [pdf]
- CoT-lized Diffusion: Let’s Reinforce T2I Generation Step-by-step, [pdf]
- X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again, [pdf]
- T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT, [pdf]
- AR-GRPO: Training Autoregressive Image Generation Models via Reinforcement Learning, [pdf]
Benchmarks
- DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers, [pdf]
- Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark, [pdf]
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation, [pdf]
- VPGen & VPEval: Visual Programming for Text-to-Image Generation and Evaluation, [pdf]
- GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment, [pdf]
- Holistic Evaluation of Text-to-Image Models, [pdf]
- Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community, [pdf]
- Rich Human Feedback for Text to Image Generation, [pdf]
- Learning Multi-Dimensional Human Preference for Text-to-Image Generation, [pdf]
- Evaluating Text-to-Visual Generation with Image-to-Text Generation, [pdf]
- Multimodal Large Language Models Make Text-to-Image Generative Models Align Better, [pdf]
- Measuring Style Similarity in Diffusion Models, [pdf]
- T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation, [pdf]
- DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation, [pdf]
- PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment, [pdf]
- Video-Bench: Human-Aligned Video Generation Benchmark, [pdf]
Reward Models
- Human Preference Score: Better Aligning Text-to-Image Models with Human Preference, [pdf]
- ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, [pdf]
- Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation, [pdf]
- Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis, [pdf]
- Improving Video Generation with Human Feedback, [pdf]
- VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation, [pdf]
- VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation, [pdf]
- Unified Reward Model for Multimodal Understanding and Generation, [pdf]
- LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment, [pdf]
- HPSv3: Towards Wide-Spectrum Human Preference Score, [pdf]
- Optimizing Prompts for Text-to-Image Generation, [pdf]
- RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions, [pdf]
- Improving Text-to-Image Consistency via Automatic Prompt Optimization, [pdf]
- Dynamic Prompt Optimizing for Text-to-Image Generation, [pdf]
- T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT, [pdf]
- RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning, [pdf]