Awesome-VLA-RL

With significant advances in Vision-Language-Action (VLA)🍔 models based on large-scale imitation learning, integrating VLA with Reinforcement Learning (RL)🥤 has emerged as a promising paradigm. This paradigm leverages the benefits of trial-and-error interactions with environments or pre-collected sub-optimal data.

This repository summarizes recent advances in the VLA🍔 + RL🥤 paradigm and provides a classification of relevant works (offline RL training(without env.), online RL training(with env.), test-time RL(during deployment), and RL alignment).

Contributions are welcome! Please feel free to submit an issue or reach out via email to add papers!

If you find this repository useful, please giving this list a star ⭐. Feel free to share it with others!

Offline RL

The Offline RL pre-trained VLA models leverage both human demonstrations and autonomously collected data.

Method	Title	Venue	Date	Code/Project	Key feature/finding
Q-Transformer	Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions	Arxiv	18/9/2023	Github	Details offline Q-learning with Transformer models: 1. Autoregressive Discrete Q-Learning; 2. Conservative Q-Learning; 3. Monte Carlo and n-step Returns
Perceiver-Actor-Critic	Offline Actor-Critic Reinforcement Learning Scales to Large Models	ICML2024	8/2/2024	Project	Details An offline actor-critic method that scales to large models of up to 1B parameters and learn a wide variety of 132 control and robotics tasks
GeRM	GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot	IROS2024	20/3/2024	Github	Details Mixtureof-Experts structure; Quadruped robot learning
ReinboT	ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning	ICML2025	12/5/2025		Details Max-Return Sequence Modeling as Reinformer; Reward Densification with heuristic methods
MoRE	MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models	ICRA2025	11/3/2025		Details Integrates multiple low-rank adaptation modules as distinct experts within a dense multi-modal large language model (MLLM), forming a sparse-activated mixture-of-experts model
CO-RFT	CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning	Arxiv	4/8/2025		Details Chunk-level offline RL finetuning. It proposed Chunked RL via n-step TD learning

Online RL

With trial-and-error interactions in online environments, VLA models can be further optimized to improve their performance.

in Simulator

Method	Title	Venue	Date	Code/Project	Key feature/finding
FLaRe	FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning	ICRA 2025 Best Paper Finalist	30/9/2024	Code	Details For large-scale fine-tuning in simulation, it performs extensive domain randomization, extract visual features through DinoV2, and utilize the KV-cache technique during inference and a set of algorithmic choices to ensure the stability of RL fine-tuning
PA-RL	Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone	Arxiv	9/12/2024	Project	Details a single method that fine-tunes multiple policy classes, with varying architectures and sizes. It enables sample-efficient improvement of diffusion and transformer-based autoregressive policies. PA-RL sets a new state of the art for offline to online RL, and it makes it possible, for the first time, to improve OpenVLA
iRe-VLA	Improving Vision-Language-Action Model with Online Reinforcement Learning	RAL2025	28/1/2025		Details Adopt SFT & RL two-stage iterative optimization to Stabilizing Training Process and Managing the Model Training Burden.
RIPT-VLA	Interactive Post-Training for Vision-Language-Action Models	Arxiv	22/5/2025	Github	Details A critic-free optimization framework called Leave-One-Out Proximal Policy Optimization (LOOP); Dynamic rollout sampling
VLA-RL	VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning	Arxiv	24/5/2025	Github	Details Robotic process reward model and the VLA-RL System with (1) Curriculum Selection Strategy (2) Critic Warmup (3) GPU-balanced Vectorized Environments (4) PPO infrastructure
RLVLA	What Can RL Bring to VLA Generalization? An Empirical Study	Arxiv	26/5/2025	Github	Details PPO consistently outperforms GRPO and DPO; Shared actor-critic backbone; VLA warm-up
RFTF	RFTF: Reinforcement Fine-tuning for Embodied Agents with Temporal Feedback	Arxiv	26/5/2025		Details For the sparse reward problem, RFTF leverages a value model trained using temporal information to generate dense rewards
SimpleVLA-RL		Github	5/2025	Github
TGRPO	TGRPO: Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization	Arxiv	10/6/2025	Github	Details From GRPO in LLM to TGRPO in VLA
OctoNav	OctoNav: Towards Generalist Embodied Navigation	Arxiv	11/6/2025	Project	Details For Navigation tasks, it proposes a VLA+RL Hybrid Training Paradigm, including SFT, Nav-GRPO, Online RL stages. The VLA model also obtains thinking-before-action ability.
RLRC	RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models	Arxiv	21/6/2025	Project	Details A RL-based VLA compression Paradigm. Through a carefully designed three-stage pipeline, structured pruning, performance recovery based on SFT and RL, and 4bit quantization, they significantly reduce model size and boost inference speed while preserving, and in some cases surpassing, the original model’s ability to execute robotic tasks

in Real-World

Method	Title	Venue	Date	Code/Project	Key feature/finding
RLDG	RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning	RSS2025	12/2024	Project	Details Pretrain task-specific RL policies with HIL-SERL; Distill RL policies into VLA for Knowledge Transfer.
PA-RL	Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone	Arxiv	9/12/2024	Project	Details a single method that fine-tunes multiple policy classes, with varying architectures and sizes. It enables sample-efficient improvement of diffusion and transformer-based autoregressive policies. PA-RL sets a new state of the art for offline to online RL, and it makes it possible, for the first time, to improve OpenVLA
iRe-VLA	Improving Vision-Language-Action Model with Online Reinforcement Learning	RAL2025	28/1/2025		Details Adopt SFT & RL two-stage iterative optimization to Stabilizing Training Process and Managing the Model Training Burden.
ConRFT	ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy	RSS2025	14/4/2025	Github	Details Offline fine-tuning(Cal-QL+PA-RL) and online fine-tuning(CPQL+HIL-SERL+PA-RL)

Test-Time RL

Leverage a value function pre-trained via offline RL.

Method	Title	Venue	Date	Code/Project	Key feature/finding
Bellman-Guided Retrials	To Err is Robotic: Rapid Value-Based Trial-and-Error during Deployment	Arxiv	22/6/2024	Github	Details Pre-train a value function to estimate task completion, recover the robot and sample a new strategy if failed
V-GPS	Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance	CoRL2024	17/10/2024	Project	Details Re-ranking multiple action proposals from a generalist policy using a value function at test-time
Hume	Hume: Introducing System-2 Thinking in Visual-Language-Action Model	Arxiv	2/6/2025	Github	Details Pre-train a value function, perform best-of-N selection of candidate action chunks with state-action value estimation

RL Alignment

Method	Title	Venue	Date	Code/Project	Key feature/finding
GRAPE	GRAPE: Generalizing Robot Policy via Preference Alignment	ICLR2025 workshop	4/2/2025	Github	Details Trajectory-wise Preference Optimization aligns VLA policies on a trajectory level
SafeVLA	SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning	Arxiv	31/5/2025	Project	Details Constraining VLA policies via safe reinforcement learning

Unclassified

Method	Title	Venue	Date	Code/Project	Key feature/finding
RPD	Refined Policy Distillation: From VLA Generalists to RL Experts	Arxiv	6/3/2025		Details Leverage VLA model as policy prior to improve sample-efficiency of RL, as Jump-Start RL

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-VLA-RL

Offline RL

Online RL

in Simulator

in Real-World

Test-Time RL

RL Alignment

Unclassified

About

Uh oh!

Releases

Packages

XiaoWei-i/Awesome-VLA-RL

Folders and files

Latest commit

History

Repository files navigation

Awesome-VLA-RL

Offline RL

Online RL

in Simulator

in Real-World

Test-Time RL

RL Alignment

Unclassified

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages