Skip to content

wendell0218/Awesome-RL-for-Video-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

Awesome-RL-for-Video-Generation

🤖 Introduction

Welcome to the GitHub repository for Awesome-RL-for-Video-Generation! This repository serves as a curated collection of research, resources, and tools related to Reinforcement Learning (RL) for Video Generation. Our goal is to provide an up-to-date and comprehensive overview of RL techniques used in video generation, focusing on the latest advancements. We aim to bridge the gap between RL theory and real-world applications in video generation tasks, offering a solid foundation for future research and development in this field. We hope this repository will serve as a valuable resource for anyone interested in exploring RL applications in video generation!

🔥 News

  • [February 14, 2025] We have developed an agent that automatically collects and analyzes the latest papers in the RL-based Video Generation field. It will update the Related Papers daily at 1:00 AM UTC+8.

🔍 Related Papers

We are committed to offering researchers the latest advancements in the field. By regularly reviewing and evaluating recent research studies, we ensure that the list of papers stays up-to-date.

⚠️ The paper analysis may not be accurate and is for reference only!

Date Paper Contribution Available Link
Jul 2025 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation


• Affiliation: Terminal Technology Department, Alipay, Ant Group
• Method Name: EchoMimicV3, Base Model: Wan2.1-FUN-inp-480p-1.3B, Strategy: DPO
Jul 2025 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

• Affiliation: Terminal Technology Department, Alipay, Ant Group
• Method Name: EchoMimicV3, Base Model: Wan2.1-FUN-inp-480p-1.3B, Strategy: DPO
Jul 2025 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

• Affiliation: University of Science and Technology of China
• Method Name: LongAnimation, Base Model: CogVideoX-1.5-5B, Strategy: NGR
Jun 2025 Video Perception Models for 3D Scene Synthesis


• Affiliation: Tsinghua University
Jun 2025 RDPO: Real Data Preference Optimization for Physics Consistency Video Generation

• Affiliation: Fudan University
• Method Name: Real Data Preference Optimization (RDPO), Base Model: LTX-Video-2B, Strategy: DPO
Jun 2025 VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning

• Affiliation: School of Electronic and Computer Engineering, Peking University
• Method Name: VQ-Insight, Base Model: Qwen-2.5-VL-7B-Instruct, Strategy: GRPO
Jun 2025 Toward Rich Video Human-Motion2D Generation



• Affiliation: Tongji University
• Method Name: RVHM2D, Base Model: None, Strategy: Fine-tuning with an FID-based reward
• Benchmark Name: Motion2D-Video-150K, Data Number: 150000, Evaluation Metric: R-Precision, FID, MM Dist, Diversity
Jun 2025 AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation


• Affiliation: ByteDance
• Method Name: AlignHuman, Base Model: , Strategy: Timestep-Segment Preference Optimization (TPO)
Jun 2025 Multimodal Large Language Models: A Survey


• Affiliation: School of Architecture, Technology and Engineering, University of Brighton
• Method Name: Video Diffusion Alignment via Reward Gradients, Base Model: , Strategy: Reward Gradients
• Method Name: Diffusion Model Alignment Using Direct Preference Optimization, Base Model: , Strategy: Direct Preference Optimization
• Method Name: VADER, Base Model: , Strategy: Backpropagating Reward Gradients
• Benchmark Name: MJ-VIDEO, Data Number: , Evaluation Metric: Fine-Grained Benchmarking and Rewarding Video Preferences
• Benchmark Name: VideoScore, Data Number: , Evaluation Metric: Simulating Fine-grained Human Feedback for Video Generation
Jun 2025 Multimodal Large Language Models: A Survey


• Affiliation: School of Architecture, Technology and Engineering, University of Brighton, United Kingdom
• Method Name: Video Diffusion Alignment via Reward Gradients, Base Model: , Strategy: Reward Gradients
• Method Name: Diffusion Model Alignment Using Direct Preference Optimization, Base Model: , Strategy: Direct Preference Optimization (DPO)
• Method Name: VADER, Base Model: , Strategy: Backpropagating Reward Gradients
• Benchmark Name: MJ-VIDEO, Data Number: , Evaluation Metric: Fine-Grained Video Preferences
• Benchmark Name: VideoScore, Data Number: , Evaluation Metric: Simulating Fine-Grained Human Feedback
Jun 2025 Seedance 1.0: Exploring the Boundaries of Video Generation Models


• Affiliation: ByteDance
• Method Name: Human Feedback Alignment (RLHF), Base Model: , Strategy: Reward feedback learning with multiple reward models (Foundational Reward Model, Motion Reward Model, Aesthetic Reward Model)
Jun 2025 Seedance 1.0: Exploring the Boundaries of Video Generation Models


• Affiliation: ByteDance
• Method Name: Human Feedback Alignment (RLHF), Base Model: , Strategy: Reward Maximization with Multi-Dimensional Reward Models
Jun 2025 ContentV: Efficient Training of Video Generation Models with Limited Compute


• Affiliation: ByteDance Douyin Content Group
• Method Name: Reinforcement Learning from Human Feedback (RLHF), Base Model: Stable Diffusion 3.5 Large (SD3.5L), Strategy: RLHF
Jun 2025 ContentV: Efficient Training of Video Generation Models with Limited Compute


• Affiliation: ByteDance Douyin Content Group
• Method Name: Reinforcement Learning from Human Feedback (RLHF), Base Model: Stable Diffusion 3.5 Large (SD3.5L), Strategy: Optimizing conditional distribution pθ(x1|c) with reward model r(c, x1) and KL-divergence regularization
May 2025 Photography Perspective Composition: Towards Aesthetic Perspective Recommendation


• Affiliation: East China University of Science and Technology
• Method Name: Photography Perspective Composition (PPC), Base Model: , Strategy: DPO
May 2025 Scaling Image and Video Generation via Test-Time Evolutionary Search


• Affiliation: Hong Kong University of Science and Technology
• Method Name: EvoSearch, Base Model: , Strategy:
May 2025 InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO



• Affiliation: MAPLE Lab, Westlake University
• Method Name: InfLVG, Base Model: , Strategy: GRPO
• Benchmark Name: CsVBench, Data Number: 1000, Evaluation Metric: HPSv2, Aesthetic Score, CLIP-Flan, ViCLIP, ArcFace-42M, ArcFace-360K, QWen
May 2025 AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection



• Affiliation: School of Electronic and Computer Engineering, Peking University
• Method Name: AvatarShield, Base Model: Qwen2.5-VL-7B, Strategy: GRPO
• Benchmark Name: FakeHumanVid, Data Number: 15000, Evaluation Metric: AUC
May 2025 RLVR-World: Training World Models with Reinforcement Learning


• Affiliation: School of Software, BNRist, Tsinghua University
• Method Name: RLVR-World, Base Model: , Strategy: GRPO
May 2025 Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models

• Affiliation: University of Michigan
• Benchmark Name: Temporally-Grounded Language Generation (TGLG), Data Number: 16487, Evaluation Metric: TRACE
May 2025 Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models


• Affiliation: MMLab, CUHK, Hong Kong
• Method Name: Negative Preference Optimization (NPO), Base Model: , Strategy: Diffusion-NPO
May 2025 DanceGRPO: Unleashing GRPO on Visual Generation


• Affiliation: ByteDance Seed
• Method Name: DanceGRPO, Base Model: , Strategy: GRPO
May 2025 VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

• Affiliation: University of Maryland, College Park
• Benchmark Name: VideoHallu, Data Number: 3000, Evaluation Metric:
Apr 2025 TesserAct: Learning 4D Embodied World Models


• Affiliation: UMass Amherst
• Method Name: TesserAct, Base Model: , Strategy:
Apr 2025 Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning

• Affiliation: Zhejiang University
• Method Name: Phys-AR, Base Model: Llama3.1-8B, Strategy: GRPO
Apr 2025 SkyReels-V2: Infinite-length Film Generative Model


• Affiliation: Skywork AI
• Method Name: SkyReels-V2, Base Model: Qwen2-VL-7B, Strategy: DPO
Apr 2025 FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos


• Affiliation: AMAP, Alibaba Group
• Method Name: FingER, Base Model: Qwen2.5-VL, Strategy: GRPO
• Benchmark Name: FingER-Instruct-60k, Data Number: 60000, Evaluation Metric:
Apr 2025 Aligning Anime Video Generation with Human Feedback


• Affiliation: Fudan University
• Method Name: Gap-Aware Preference Optimization (GAPO), Base Model: , Strategy: Direct Preference Optimization (DPO)
• Benchmark Name: AnimeReward, Data Number: 30000, Evaluation Metric: multi-dimensional reward scores
Apr 2025 Discriminator-Free Direct Preference Optimization for Video Diffusion

• Affiliation: Zhejiang University
• Method Name: Discriminator-Free Video Preference Optimization (DF-VPO), Base Model: , Strategy: DPO
Apr 2025 Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments


• Affiliation: University of Trento, Italy
• Benchmark Name: Morpheus, Data Number: 80, Evaluation Metric: Dynamical Score
Apr 2025 OmniCam: Unified Multimodal Video Generation via Camera Control


• Affiliation: Zhejiang University
• Method Name: OmniCam, Base Model: Llama3.1, Strategy: PPO
• Benchmark Name: OmniTr, Data Number: 1000 trajectories, 10,000 descriptions, 30,000 videos, Evaluation Metric: Mstarttime, Mendtime, Mspeed, Mrotate, Mdirection
Mar 2025 VPO: Aligning Text-to-Video Generation Models with Prompt Optimization


• Affiliation: The Conversational Artificial Intelligence (CoAI) Group, Tsinghua University
• Method Name: VPO, Base Model: LLaMA3-8B-Instruct, Strategy: DPO
Mar 2025 Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors


• Affiliation: The University of Hong Kong
• Method Name: Physics-based HOI Refinement, Base Model: , Strategy: Actor-Critic with Gaussian Policy
Mar 2025 Judge Anything: MLLM as a Judge Across Any Modality


• Affiliation: Huazhong University of Science and Technology
• Benchmark Name: TASKANYTHING, Data Number: 1500, Evaluation Metric:
• Benchmark Name: JUDGE ANYTHING, Data Number: 9000, Evaluation Metric: Agreement, Pearson correlation, Spearman correlation, MAE, Accuracy
Mar 2025 MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization


• Affiliation: Zhejiang University
• Method Name: MagicID, Base Model: , Strategy: DPO
Mar 2025 Unified Reward Model for Multimodal Understanding and Generation



• Affiliation: Fudan University
• Method Name: UnifiedReward, Base Model: LLaVA-OneVision-7B, Strategy: DPO
Feb 2025 Pre-Trained Video Generative Models as World Simulators

• Affiliation: Hong Kong University of Science and Technology
• Method Name: Dynamic World Simulation (DWS), Base Model: , Strategy: PPO
Feb 2025 Harness Local Rewards for Global Benefits: Effective Text-to-Video Generation Alignment with Patch-level Reward Models

• Affiliation: Carnegie Mellon University
• Method Name: HALO, Base Model: , Strategy: DPO
Feb 2025 IPO: Iterative Preference Optimization for Text-to-Video Generation

• Affiliation: Shanghai Academy of Artificial Intelligence for Science
• Method Name: Iterative Preference Optimization (IPO), Base Model: , Strategy: DPO
Feb 2025 MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation



• Affiliation: UNC-Chapel Hill
• Benchmark Name: MJ-BENCH-VIDEO, Data Number: 5421, Evaluation Metric:
Feb 2025 HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment


• Affiliation: Zhejiang University
• Method Name: HuViDPO, Base Model: , Strategy: DPO
Feb 2025 Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer


• Affiliation: Guanghua School of Management, Peking University
• Method Name: Recursive Likelihood Ratio (RLR) optimizer, Base Model: , Strategy:
Jan 2025 Improving Video Generation with Human Feedback



• Affiliation: The Chinese University of Hong Kong
• Method Name: Flow-DPO, Base Model: , Strategy: DPO
• Method Name: Flow-RWR, Base Model: , Strategy: RWR
• Method Name: Flow-NRG, Base Model: , Strategy: Reward Guidance
• Benchmark Name: VideoGen-RewardBench, Data Number: 26500, Evaluation Metric:
Dec 2024 VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation


• Affiliation: Tsinghua University
• Method Name: Multi-Objective Preference Optimization (MPO), Base Model: , Strategy: DPO
Dec 2024 OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization


• Affiliation: The University of Hong Kong
• Method Name: OnlineVPO, Base Model: , Strategy: DPO
Dec 2024 VideoDPO: Omni-Preference Alignment for Video Diffusion Generation


• Affiliation: HKUST
• Method Name: VideoDPO, Base Model: , Strategy: DPO
Dec 2024 FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks


• Affiliation: National University of Singapore
• Method Name: FLIP, Base Model: , Strategy: model-based planning
Dec 2024 The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control


• Affiliation: Tongyi Lab
• Method Name: The Matrix, Base Model: , Strategy: Shift-Window Denoising Process Model (Swin-DPM)
Dec 2024 Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback


• Affiliation: The University of Tokyo
• Method Name: RL-Finetuning for Text-to-Video Models, Base Model: , Strategy: RWR, DPO
Nov 2024 Free$^2$Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models


• Affiliation: Kim Jaechul Graduate School of AI, KAIST
• Method Name: Free2Guide, Base Model: , Strategy: Path Integral Control
Nov 2024 A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

• Affiliation: SSE, The Chinese University of Hong Kong, Shenzhen
• Method Name: RL-based editing framework, Base Model: , Strategy: actor-critic
Oct 2024 Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient

• Affiliation: Northwestern University
• Method Name: RL-V2V-GAN, Base Model: , Strategy: Policy Gradient
Oct 2024 WorldSimBench: Towards Video Generation Models as World Simulators


• Affiliation: The Chinese University of Hong Kong, Shenzhen
• Benchmark Name: WorldSimBench, Data Number: 35701, Evaluation Metric: Human Preference Evaluator
Oct 2024 Animating the Past: Reconstruct Trilobite via Video Generation

• Affiliation: AI Lab, Yishi Inc.
• Method Name: Automatic T2V Prompt Learning Method, Base Model: , Strategy: KTO
Oct 2024 VideoAgent: Self-Improving Video Generation


• Affiliation: University of Waterloo
• Method Name: VideoAgent, Base Model: , Strategy: self-improvement through online finetuning
Oct 2024 E-Motion: Future Motion Simulation via Event Sequence Diffusion


• Affiliation: Xidian University
• Method Name: Event-Sequence Diffusion Network, Base Model: , Strategy: PPO
Oct 2024 DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control


• Affiliation: ETH Z__rich
• Method Name: DART, Base Model: , Strategy: PPO
Oct 2024 SePPO: Semi-Policy Preference Optimization for Diffusion Alignment


• Affiliation: University of Rochester
• Method Name: SePPO, Base Model: , Strategy: DPO
Jul 2024 Video Diffusion Alignment via Reward Gradients


• Affiliation: Carnegie Mellon University
• Method Name: VADER, Base Model: , Strategy: Reward Gradients
Dec 2023 InstructVideo: Instructing Video Diffusion Models with Human Feedback


• Affiliation: Zhejiang University
• Method Name: InstructVideo, Base Model: , Strategy: reward fine-tuning
Nov 2023 AdaDiff: Adaptive Step Selection for Fast Diffusion Models

• Affiliation: Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University
• Method Name: AdaDiff, Base Model: , Strategy: policy gradient

💪 How to Contribute

If you have a paper or are aware of relevant research that should be incorporated, please contribute via pull requests, issues, email, or other suitable methods.

About

A curated list of papers on reinforcement learning for video generation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published