Awesome-RL-for-Video-Generation

🤖 Introduction

Welcome to the GitHub repository for Awesome-RL-for-Video-Generation! This repository serves as a curated collection of research, resources, and tools related to Reinforcement Learning (RL) for Video Generation. Our goal is to provide an up-to-date and comprehensive overview of RL techniques used in video generation, focusing on the latest advancements. We aim to bridge the gap between RL theory and real-world applications in video generation tasks, offering a solid foundation for future research and development in this field. We hope this repository will serve as a valuable resource for anyone interested in exploring RL applications in video generation!

🔥 News

[February 14, 2025] We have developed an agent that automatically collects and analyzes the latest papers in the RL-based Video Generation field. It will update the Related Papers daily at 1:00 AM UTC+8.

🔍 Related Papers

We are committed to offering researchers the latest advancements in the field. By regularly reviewing and evaluating recent research studies, we ensure that the list of papers stays up-to-date.

⚠️ The paper analysis may not be accurate and is for reference only!

Date	Paper	Contribution	Available Link
Jul 2025	EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
	• Affiliation: Terminal Technology Department, Alipay, Ant Group • Method Name: EchoMimicV3, Base Model: Wan2.1-FUN-inp-480p-1.3B, Strategy: DPO
Jul 2025	EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
	• Affiliation: Terminal Technology Department, Alipay, Ant Group • Method Name: EchoMimicV3, Base Model: Wan2.1-FUN-inp-480p-1.3B, Strategy: DPO
Jul 2025	LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
	• Affiliation: University of Science and Technology of China • Method Name: LongAnimation, Base Model: CogVideoX-1.5-5B, Strategy: NGR
Jun 2025	Video Perception Models for 3D Scene Synthesis
	• Affiliation: Tsinghua University
Jun 2025	RDPO: Real Data Preference Optimization for Physics Consistency Video Generation
	• Affiliation: Fudan University • Method Name: Real Data Preference Optimization (RDPO), Base Model: LTX-Video-2B, Strategy: DPO
Jun 2025	VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning
	• Affiliation: School of Electronic and Computer Engineering, Peking University • Method Name: VQ-Insight, Base Model: Qwen-2.5-VL-7B-Instruct, Strategy: GRPO
Jun 2025	Toward Rich Video Human-Motion2D Generation
	• Affiliation: Tongji University • Method Name: RVHM2D, Base Model: None, Strategy: Fine-tuning with an FID-based reward • Benchmark Name: Motion2D-Video-150K, Data Number: 150000, Evaluation Metric: R-Precision, FID, MM Dist, Diversity
Jun 2025	AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation
	• Affiliation: ByteDance • Method Name: AlignHuman, Base Model: , Strategy: Timestep-Segment Preference Optimization (TPO)
Jun 2025	Multimodal Large Language Models: A Survey
	• Affiliation: School of Architecture, Technology and Engineering, University of Brighton • Method Name: Video Diffusion Alignment via Reward Gradients, Base Model: , Strategy: Reward Gradients • Method Name: Diffusion Model Alignment Using Direct Preference Optimization, Base Model: , Strategy: Direct Preference Optimization • Method Name: VADER, Base Model: , Strategy: Backpropagating Reward Gradients • Benchmark Name: MJ-VIDEO, Data Number: , Evaluation Metric: Fine-Grained Benchmarking and Rewarding Video Preferences • Benchmark Name: VideoScore, Data Number: , Evaluation Metric: Simulating Fine-grained Human Feedback for Video Generation
Jun 2025	Multimodal Large Language Models: A Survey
	• Affiliation: School of Architecture, Technology and Engineering, University of Brighton, United Kingdom • Method Name: Video Diffusion Alignment via Reward Gradients, Base Model: , Strategy: Reward Gradients • Method Name: Diffusion Model Alignment Using Direct Preference Optimization, Base Model: , Strategy: Direct Preference Optimization (DPO) • Method Name: VADER, Base Model: , Strategy: Backpropagating Reward Gradients • Benchmark Name: MJ-VIDEO, Data Number: , Evaluation Metric: Fine-Grained Video Preferences • Benchmark Name: VideoScore, Data Number: , Evaluation Metric: Simulating Fine-Grained Human Feedback
Jun 2025	Seedance 1.0: Exploring the Boundaries of Video Generation Models
	• Affiliation: ByteDance • Method Name: Human Feedback Alignment (RLHF), Base Model: , Strategy: Reward feedback learning with multiple reward models (Foundational Reward Model, Motion Reward Model, Aesthetic Reward Model)
Jun 2025	Seedance 1.0: Exploring the Boundaries of Video Generation Models
	• Affiliation: ByteDance • Method Name: Human Feedback Alignment (RLHF), Base Model: , Strategy: Reward Maximization with Multi-Dimensional Reward Models
Jun 2025	ContentV: Efficient Training of Video Generation Models with Limited Compute
	• Affiliation: ByteDance Douyin Content Group • Method Name: Reinforcement Learning from Human Feedback (RLHF), Base Model: Stable Diffusion 3.5 Large (SD3.5L), Strategy: RLHF
Jun 2025	ContentV: Efficient Training of Video Generation Models with Limited Compute
	• Affiliation: ByteDance Douyin Content Group • Method Name: Reinforcement Learning from Human Feedback (RLHF), Base Model: Stable Diffusion 3.5 Large (SD3.5L), Strategy: Optimizing conditional distribution pθ(x1\|c) with reward model r(c, x1) and KL-divergence regularization
May 2025	Photography Perspective Composition: Towards Aesthetic Perspective Recommendation
	• Affiliation: East China University of Science and Technology • Method Name: Photography Perspective Composition (PPC), Base Model: , Strategy: DPO
May 2025	Scaling Image and Video Generation via Test-Time Evolutionary Search
	• Affiliation: Hong Kong University of Science and Technology • Method Name: EvoSearch, Base Model: , Strategy:
May 2025	InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO
	• Affiliation: MAPLE Lab, Westlake University • Method Name: InfLVG, Base Model: , Strategy: GRPO • Benchmark Name: CsVBench, Data Number: 1000, Evaluation Metric: HPSv2, Aesthetic Score, CLIP-Flan, ViCLIP, ArcFace-42M, ArcFace-360K, QWen
May 2025	AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection
	• Affiliation: School of Electronic and Computer Engineering, Peking University • Method Name: AvatarShield, Base Model: Qwen2.5-VL-7B, Strategy: GRPO • Benchmark Name: FakeHumanVid, Data Number: 15000, Evaluation Metric: AUC
May 2025	RLVR-World: Training World Models with Reinforcement Learning
	• Affiliation: School of Software, BNRist, Tsinghua University • Method Name: RLVR-World, Base Model: , Strategy: GRPO
May 2025	Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
	• Affiliation: University of Michigan • Benchmark Name: Temporally-Grounded Language Generation (TGLG), Data Number: 16487, Evaluation Metric: TRACE
May 2025	Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
	• Affiliation: MMLab, CUHK, Hong Kong • Method Name: Negative Preference Optimization (NPO), Base Model: , Strategy: Diffusion-NPO
May 2025	DanceGRPO: Unleashing GRPO on Visual Generation
	• Affiliation: ByteDance Seed • Method Name: DanceGRPO, Base Model: , Strategy: GRPO
May 2025	VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
	• Affiliation: University of Maryland, College Park • Benchmark Name: VideoHallu, Data Number: 3000, Evaluation Metric:
Apr 2025	TesserAct: Learning 4D Embodied World Models
	• Affiliation: UMass Amherst • Method Name: TesserAct, Base Model: , Strategy:
Apr 2025	Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
	• Affiliation: Zhejiang University • Method Name: Phys-AR, Base Model: Llama3.1-8B, Strategy: GRPO
Apr 2025	SkyReels-V2: Infinite-length Film Generative Model
	• Affiliation: Skywork AI • Method Name: SkyReels-V2, Base Model: Qwen2-VL-7B, Strategy: DPO
Apr 2025	FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos
	• Affiliation: AMAP, Alibaba Group • Method Name: FingER, Base Model: Qwen2.5-VL, Strategy: GRPO • Benchmark Name: FingER-Instruct-60k, Data Number: 60000, Evaluation Metric:
Apr 2025	Aligning Anime Video Generation with Human Feedback
	• Affiliation: Fudan University • Method Name: Gap-Aware Preference Optimization (GAPO), Base Model: , Strategy: Direct Preference Optimization (DPO) • Benchmark Name: AnimeReward, Data Number: 30000, Evaluation Metric: multi-dimensional reward scores
Apr 2025	Discriminator-Free Direct Preference Optimization for Video Diffusion
	• Affiliation: Zhejiang University • Method Name: Discriminator-Free Video Preference Optimization (DF-VPO), Base Model: , Strategy: DPO
Apr 2025	Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments
	• Affiliation: University of Trento, Italy • Benchmark Name: Morpheus, Data Number: 80, Evaluation Metric: Dynamical Score
Apr 2025	OmniCam: Unified Multimodal Video Generation via Camera Control
	• Affiliation: Zhejiang University • Method Name: OmniCam, Base Model: Llama3.1, Strategy: PPO • Benchmark Name: OmniTr, Data Number: 1000 trajectories, 10,000 descriptions, 30,000 videos, Evaluation Metric: Mstarttime, Mendtime, Mspeed, Mrotate, Mdirection
Mar 2025	VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
	• Affiliation: The Conversational Artificial Intelligence (CoAI) Group, Tsinghua University • Method Name: VPO, Base Model: LLaMA3-8B-Instruct, Strategy: DPO
Mar 2025	Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors
	• Affiliation: The University of Hong Kong • Method Name: Physics-based HOI Refinement, Base Model: , Strategy: Actor-Critic with Gaussian Policy
Mar 2025	Judge Anything: MLLM as a Judge Across Any Modality
	• Affiliation: Huazhong University of Science and Technology • Benchmark Name: TASKANYTHING, Data Number: 1500, Evaluation Metric: • Benchmark Name: JUDGE ANYTHING, Data Number: 9000, Evaluation Metric: Agreement, Pearson correlation, Spearman correlation, MAE, Accuracy
Mar 2025	MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
	• Affiliation: Zhejiang University • Method Name: MagicID, Base Model: , Strategy: DPO
Mar 2025	Unified Reward Model for Multimodal Understanding and Generation
	• Affiliation: Fudan University • Method Name: UnifiedReward, Base Model: LLaVA-OneVision-7B, Strategy: DPO
Feb 2025	Pre-Trained Video Generative Models as World Simulators
	• Affiliation: Hong Kong University of Science and Technology • Method Name: Dynamic World Simulation (DWS), Base Model: , Strategy: PPO
Feb 2025	Harness Local Rewards for Global Benefits: Effective Text-to-Video Generation Alignment with Patch-level Reward Models
	• Affiliation: Carnegie Mellon University • Method Name: HALO, Base Model: , Strategy: DPO
Feb 2025	IPO: Iterative Preference Optimization for Text-to-Video Generation
	• Affiliation: Shanghai Academy of Artificial Intelligence for Science • Method Name: Iterative Preference Optimization (IPO), Base Model: , Strategy: DPO
Feb 2025	MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
	• Affiliation: UNC-Chapel Hill • Benchmark Name: MJ-BENCH-VIDEO, Data Number: 5421, Evaluation Metric:
Feb 2025	HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment
	• Affiliation: Zhejiang University • Method Name: HuViDPO, Base Model: , Strategy: DPO
Feb 2025	Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
	• Affiliation: Guanghua School of Management, Peking University • Method Name: Recursive Likelihood Ratio (RLR) optimizer, Base Model: , Strategy:
Jan 2025	Improving Video Generation with Human Feedback
	• Affiliation: The Chinese University of Hong Kong • Method Name: Flow-DPO, Base Model: , Strategy: DPO • Method Name: Flow-RWR, Base Model: , Strategy: RWR • Method Name: Flow-NRG, Base Model: , Strategy: Reward Guidance • Benchmark Name: VideoGen-RewardBench, Data Number: 26500, Evaluation Metric:
Dec 2024	VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
	• Affiliation: Tsinghua University • Method Name: Multi-Objective Preference Optimization (MPO), Base Model: , Strategy: DPO
Dec 2024	OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization
	• Affiliation: The University of Hong Kong • Method Name: OnlineVPO, Base Model: , Strategy: DPO
Dec 2024	VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
	• Affiliation: HKUST • Method Name: VideoDPO, Base Model: , Strategy: DPO
Dec 2024	FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks
	• Affiliation: National University of Singapore • Method Name: FLIP, Base Model: , Strategy: model-based planning
Dec 2024	The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control
	• Affiliation: Tongyi Lab • Method Name: The Matrix, Base Model: , Strategy: Shift-Window Denoising Process Model (Swin-DPM)
Dec 2024	Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
	• Affiliation: The University of Tokyo • Method Name: RL-Finetuning for Text-to-Video Models, Base Model: , Strategy: RWR, DPO
Nov 2024	Free$^2$Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models
	• Affiliation: Kim Jaechul Graduate School of AI, KAIST • Method Name: Free2Guide, Base Model: , Strategy: Path Integral Control
Nov 2024	A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model
	• Affiliation: SSE, The Chinese University of Hong Kong, Shenzhen • Method Name: RL-based editing framework, Base Model: , Strategy: actor-critic
Oct 2024	Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient
	• Affiliation: Northwestern University • Method Name: RL-V2V-GAN, Base Model: , Strategy: Policy Gradient
Oct 2024	WorldSimBench: Towards Video Generation Models as World Simulators
	• Affiliation: The Chinese University of Hong Kong, Shenzhen • Benchmark Name: WorldSimBench, Data Number: 35701, Evaluation Metric: Human Preference Evaluator
Oct 2024	Animating the Past: Reconstruct Trilobite via Video Generation
	• Affiliation: AI Lab, Yishi Inc. • Method Name: Automatic T2V Prompt Learning Method, Base Model: , Strategy: KTO
Oct 2024	VideoAgent: Self-Improving Video Generation
	• Affiliation: University of Waterloo • Method Name: VideoAgent, Base Model: , Strategy: self-improvement through online finetuning
Oct 2024	E-Motion: Future Motion Simulation via Event Sequence Diffusion
	• Affiliation: Xidian University • Method Name: Event-Sequence Diffusion Network, Base Model: , Strategy: PPO
Oct 2024	DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control
	• Affiliation: ETH Z__rich • Method Name: DART, Base Model: , Strategy: PPO
Oct 2024	SePPO: Semi-Policy Preference Optimization for Diffusion Alignment
	• Affiliation: University of Rochester • Method Name: SePPO, Base Model: , Strategy: DPO
Jul 2024	Video Diffusion Alignment via Reward Gradients
	• Affiliation: Carnegie Mellon University • Method Name: VADER, Base Model: , Strategy: Reward Gradients
Dec 2023	InstructVideo: Instructing Video Diffusion Models with Human Feedback
	• Affiliation: Zhejiang University • Method Name: InstructVideo, Base Model: , Strategy: reward fine-tuning
Nov 2023	AdaDiff: Adaptive Step Selection for Fast Diffusion Models
	• Affiliation: Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University • Method Name: AdaDiff, Base Model: , Strategy: policy gradient

💪 How to Contribute

If you have a paper or are aware of relevant research that should be incorporated, please contribute via pull requests, issues, email, or other suitable methods.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-RL-for-Video-Generation

🤖 Introduction

🔥 News

🔍 Related Papers

💪 How to Contribute

About

Uh oh!

Releases

Packages

Uh oh!

wendell0218/Awesome-RL-for-Video-Generation

Folders and files

Latest commit

History

Repository files navigation

Awesome-RL-for-Video-Generation

🤖 Introduction

🔥 News

🔍 Related Papers

💪 How to Contribute

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages