Skip to content

This repository summarizes recent advances in the VLA + RL paradigm and provides a taxonomic classification of relevant works.

Notifications You must be signed in to change notification settings

XiaoWei-i/Awesome-VLA-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

Awesome-VLA-RL

With significant advances in Vision-Language-Action (VLA)🍔 models based on large-scale imitation learning, integrating VLA with Reinforcement Learning (RL)🥤 has emerged as a promising paradigm. This paradigm leverages the benefits of trial-and-error interactions with environments or pre-collected sub-optimal data.

This repository summarizes recent advances in the VLA🍔 + RL🥤 paradigm and provides a classification of relevant works (offline RL training(without env.), online RL training(with env.), test-time RL(during deployment), and RL alignment).

Contributions are welcome! Please feel free to submit an issue or reach out via email to add papers!

If you find this repository useful, please giving this list a star ⭐. Feel free to share it with others!

Offline RL

The Offline RL pre-trained VLA models leverage both human demonstrations and autonomously collected data.

Method Title Venue Date Code/Project Key feature/finding
Q-Transformer Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions Arxiv 18/9/2023 Github
Detailsoffline Q-learning with Transformer models: 1. Autoregressive Discrete Q-Learning; 2. Conservative Q-Learning; 3. Monte Carlo and n-step Returns
Perceiver-Actor-Critic Offline Actor-Critic Reinforcement Learning Scales to Large Models ICML2024 8/2/2024 Project
DetailsAn offline actor-critic method that scales to large models of up to 1B parameters and learn a wide variety of 132 control and robotics tasks
GeRM GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot IROS2024 20/3/2024 Github
DetailsMixtureof-Experts structure; Quadruped robot learning
ReinboT ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning ICML2025 12/5/2025
DetailsMax-Return Sequence Modeling as Reinformer; Reward Densification with heuristic methods
MoRE MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models ICRA2025 11/3/2025
DetailsIntegrates multiple low-rank adaptation modules as distinct experts within a dense multi-modal large language model (MLLM), forming a sparse-activated mixture-of-experts model
CO-RFT CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning Arxiv 4/8/2025
DetailsChunk-level offline RL finetuning. It proposed Chunked RL via n-step TD learning

Online RL

With trial-and-error interactions in online environments, VLA models can be further optimized to improve their performance.

in Simulator

Method Title Venue Date Code/Project Key feature/finding
FLaRe FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning ICRA 2025 Best Paper Finalist 30/9/2024 Code
DetailsFor large-scale fine-tuning in simulation, it performs extensive domain randomization, extract visual features through DinoV2, and utilize the KV-cache technique during inference and a set of algorithmic choices to ensure the stability of RL fine-tuning
PA-RL Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone Arxiv 9/12/2024 Project
Detailsa single method that fine-tunes multiple policy classes, with varying architectures and sizes. It enables sample-efficient improvement of diffusion and transformer-based autoregressive policies. PA-RL sets a new state of the art for offline to online RL, and it makes it possible, for the first time, to improve OpenVLA
iRe-VLA Improving Vision-Language-Action Model with Online Reinforcement Learning RAL2025 28/1/2025
DetailsAdopt SFT & RL two-stage iterative optimization to Stabilizing Training Process and Managing the Model Training Burden.
RIPT-VLA Interactive Post-Training for Vision-Language-Action Models Arxiv 22/5/2025 Github
DetailsA critic-free optimization framework called Leave-One-Out Proximal Policy Optimization (LOOP); Dynamic rollout sampling
VLA-RL VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning Arxiv 24/5/2025 Github
DetailsRobotic process reward model and the VLA-RL System with (1) Curriculum Selection Strategy (2) Critic Warmup (3) GPU-balanced Vectorized Environments (4) PPO infrastructure
RLVLA What Can RL Bring to VLA Generalization? An Empirical Study Arxiv 26/5/2025 Github
DetailsPPO consistently outperforms GRPO and DPO; Shared actor-critic backbone; VLA warm-up
RFTF RFTF: Reinforcement Fine-tuning for Embodied Agents with Temporal Feedback Arxiv 26/5/2025
DetailsFor the sparse reward problem, RFTF leverages a value model trained using temporal information to generate dense rewards
SimpleVLA-RL Github 5/2025 Github
TGRPO TGRPO: Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization Arxiv 10/6/2025 Github
DetailsFrom GRPO in LLM to TGRPO in VLA
OctoNav OctoNav: Towards Generalist Embodied Navigation Arxiv 11/6/2025 Project
DetailsFor Navigation tasks, it proposes a VLA+RL Hybrid Training Paradigm, including SFT, Nav-GRPO, Online RL stages. The VLA model also obtains thinking-before-action ability.
RLRC RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models Arxiv 21/6/2025 Project
DetailsA RL-based VLA compression Paradigm. Through a carefully designed three-stage pipeline, structured pruning, performance recovery based on SFT and RL, and 4bit quantization, they significantly reduce model size and boost inference speed while preserving, and in some cases surpassing, the original model’s ability to execute robotic tasks

in Real-World

Method Title Venue Date Code/Project Key feature/finding
RLDG RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning RSS2025 12/2024 Project
DetailsPretrain task-specific RL policies with HIL-SERL; Distill RL policies into VLA for Knowledge Transfer.
PA-RL Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone Arxiv 9/12/2024 Project
Detailsa single method that fine-tunes multiple policy classes, with varying architectures and sizes. It enables sample-efficient improvement of diffusion and transformer-based autoregressive policies. PA-RL sets a new state of the art for offline to online RL, and it makes it possible, for the first time, to improve OpenVLA
iRe-VLA Improving Vision-Language-Action Model with Online Reinforcement Learning RAL2025 28/1/2025
DetailsAdopt SFT & RL two-stage iterative optimization to Stabilizing Training Process and Managing the Model Training Burden.
ConRFT ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy RSS2025 14/4/2025 Github
DetailsOffline fine-tuning(Cal-QL+PA-RL) and online fine-tuning(CPQL+HIL-SERL+PA-RL)

Test-Time RL

Leverage a value function pre-trained via offline RL.

Method Title Venue Date Code/Project Key feature/finding
Bellman-Guided Retrials To Err is Robotic: Rapid Value-Based Trial-and-Error during Deployment Arxiv 22/6/2024 Github
DetailsPre-train a value function to estimate task completion, recover the robot and sample a new strategy if failed
V-GPS Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance CoRL2024 17/10/2024 Project
DetailsRe-ranking multiple action proposals from a generalist policy using a value function at test-time
Hume Hume: Introducing System-2 Thinking in Visual-Language-Action Model Arxiv 2/6/2025 Github
Details Pre-train a value function, perform best-of-N selection of candidate action chunks with state-action value estimation

RL Alignment

Method Title Venue Date Code/Project Key feature/finding
GRAPE GRAPE: Generalizing Robot Policy via Preference Alignment ICLR2025 workshop 4/2/2025 Github
DetailsTrajectory-wise Preference Optimization aligns VLA policies on a trajectory level
SafeVLA SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning Arxiv 31/5/2025 Project
DetailsConstraining VLA policies via safe reinforcement learning

Unclassified

Method Title Venue Date Code/Project Key feature/finding
RPD Refined Policy Distillation: From VLA Generalists to RL Experts Arxiv 6/3/2025
DetailsLeverage VLA model as policy prior to improve sample-efficiency of RL, as Jump-Start RL

About

This repository summarizes recent advances in the VLA + RL paradigm and provides a taxonomic classification of relevant works.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published