Skip to content

worldbench/awesome-vla-for-ad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Awesome Logo Visitors PR's Welcome

😎 Awesome VLA for Autonomous Driving

This survey reviews vision-action (VA) models and vision-language-action (VLA) models for autonomous driving.

Table of Contents

1. Vision-Action-Models

1️⃣ Action-Only Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
LBC arXiv
Learning by Cheating
CoRL 2020 - GitHub
Latent-DRL arXiv
End-to-End Model-Free Reinforcement Learning for Urban Driving using Implicit Affordances
CVPR 2020 - -
NEAT arXiv
NEAT: Neural Attention Fields for End-to-End Autonomous Driving
ICCV 2021 - GitHub
Roach arXiv
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
ICCV 2021 Website GitHub
WoR arXiv
Learning to Drive from A World on Rails
ICCV 2021 Website GitHub
TCP arXiv
Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline
NeurIPS 2022 - GitHub
Urban-Driver arXiv
Urban Driver: Learning to Drive from Real-world Demonstrations Using Policy Gradients
CoRL 2022 Website GitHub
LAV arXiv
Learning from All Vehicles
CVPR 2022 Website GitHub
TransFuser arXiv
TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving
TPAMI 2023 - GitHub
GRI arXiv
GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving
Robotics 2023 - -
BEVPlanner arXiv
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
CVPR 2024 - GitHub
Raw2Drive arXiv
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
arXiv 2025 - -
GoalFlow arXiv
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
CVPR 2025 Website GitHub
RAD arXiv
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
arXiv 2025 Website -

2️⃣ Perception-Action Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
ST-P3 arXiv
ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning
ECCV 2022 - GitHub
UniAD arXiv
Planning-oriented Autonomous Driving
CVPR 2023 - GitHub
VAD arXiv
VAD: Vectorized Scene Representation for Efficient Autonomous Driving
ICCV 2023 - GitHub
OccNet arXiv
Scene as Occupancy
ICCV 2023 - GitHub
GenAD arXiv
GenAD: Generative End-to-End Autonomous Driving
ECCV 2024 - GitHub
PARA-Drive CVPR
PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving
CVPR 2024 Website -
SparseAD arXiv
SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving
arXiv 2024 - -
GaussianAD arXiv
GaussianAD: Gaussian-Centric End-to-End Autonomous Driving
arXiv 2024 - -
DiFSD arXiv
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving
arXiv 2024 - GitHub
DriveTransformer arXiv
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
ICLR 2025 - GitHub
SparseDrive arXiv
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
ICRA 2025 - GitHub
DiffusionDrive arXiv
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
CVPR 2025 - GitHub

3️⃣ Image-based World Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
DriveDreamer arXiv
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
ECCV 2024 Website GitHub
GenAD arXiv
GenAD: Generalized Predictive Model for Autonomous Driving
CVPR 2024 - GitHub
Drive-WM arXiv
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
CVPR 2024 Website GitHub
DrivingWorld arXiv
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
arXiv 2024 Website GitHub
Imagine-2-Drive arXiv
Imagine-2-Drive: Leveraging High-Fidelity World Models via Multi-Modal Diffusion Policies
IROS 2025 Website -
DrivingGPT arXiv
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
arXiv 2024 Website -
Epona arXiv
Epona: Autoregressive Diffusion World Model for Autonomous Driving
ICCV 2025 Website GitHub

4️⃣ Occupancy-based World Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
OccWorld arXiv
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
ECCV 2024 Website GitHub
NeMo ECCV
Neural Volumetric World Models for Autonomous Driving
ECCV 2024 - -
OccVAR OpenReview
OCCVAR: Scalable 4D Occupancy Prediction via Next-Scale Prediction
OpenReview 2024 - -
RenderWorld arXiv
RenderWorld: World Model with Self-Supervised 3D Label
arXiv 2024 - -
DFIT-OccWorld arXiv
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training
arXiv 2024 - -
Drive-OccWorld arXiv
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
AAAI 2025 Website GitHub
TΒ³Former arXiv
Temporal Triplane Transformers as Occupancy World Models
arXiv 2025 - -

5️⃣ Latent-based World Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
Covariate-Shift arXiv
Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models
arXiv 2024 - -
World4Drive arXiv
World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
ICCV 2025 - -
WoTE arXiv
End-to-End Driving with Online Trajectory Evaluation via BEV World Model
ICCV 2025 - GitHub
LAW arXiv
Enhancing End-to-End Autonomous Driving with Latent World Model
ICLR 2025 - GitHub
SSR arXiv
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
ICLR 2025 - GitHub
Echo-Planning arXiv
Echo Planning for Autonomous Driving: From Current Observations to Future Trajectories and Back
arXiv 2025 - -
SeerDrive arXiv
Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution
NeurIPS 2025 - GitHub

2. Vision-Language-Action-Models

⏲️ In chronological order, from the earliest to the latest.

1️⃣ Textual Action Generator

Model Paper Venue Website GitHub
DriveMLM arXiv
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
arXiv 2023 - GitHub
DriveLikeAHuman arXiv
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
arXiv 2023 Website GitHub
GPT-Driver arXiv
GPT-Driver: Learning to Drive with GPT
NeurIPSW 2023 Website GitHub
RAG-Driver arXiv
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model
RSS 2024 Website GitHub
RDA-Driver arXiv
Making Large Language Models Better Planners with Reasoning-Decision Alignment
ECCV 2024 - -
DriveLM arXiv
DriveLM: Driving with Graph Visual Question Answering
ECCV 2024 Website GitHub
DriveGPT4 arXiv
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
RA-L 2024 Website -
DriVLMe arXiv
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experience
IROS 2024 Website GitHub
LLaDA arXiv
Driving Everywhere with Large Language Model Policy Adaptation
CVPR 2024 Website GitHub
VLAAD WACVW
VLAAD: Vision and Language Assistant for Autonomous Driving
WACVW 2024 - GitHub
OccLLaMA arXiv
OccLLaMA: A Unified Occupancy-Language-Action World Model for Understanding and Generation Tasks in Autonomous Driving
arXiv 2024 Website -
Doe-1 arXiv
Doe-1: Closed-Loop Autonomous Driving with Large World Model
arXiv 2024 Website GitHub
SafeAuto arXiv
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
ICML 2025 - GitHub
OpenEMMA arXiv
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
arXiv 2024 - GitHub
ReasonPlan arXiv
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
CoRL 2025 - GitHub
FutureSightDrive arXiv
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
NeurIPS 2025 Website GitHub
WKER arXiv
World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving
AAAI 2025 - -
OmniDrive arXiv
OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
CVPR 2025 - GitHub
EMMA arXiv
EMMA: End-to-End Multimodal Model for Autonomous Driving
arXiv 2024 Website -
AlphaDrive arXiv
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
arXiv 2025 - GitHub
Occ-LLM arXiv
Occ-LLM: Enhancing Autonomous Driving with Occupancy-BasedLarge Language Models
arXiv 2025 - -
DriveAgent-R1 arXiv
DriveAgent-R1: Advancing VLM-based Autonomous Driving with Hybrid Thinking and Active Perception
arXiv 2025 - -
Sce2DriveX arXiv
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
arXiv 2025 - -
Drive-R1 arXiv
Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning
arXiv 2025 - -
FastDriveVLA arXiv
FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning
arXiv 2025 - -
Wisead arXiv
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model
arXiv 2024 Website GitHub
ImpromptuVLA arXiv
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
arXiv 2025 Website GitHub
LightEMMA arXiv
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
arXiv 2025 - GitHub
AutoDrive-RΒ² arXiv
AutoDrive-RΒ²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
arXiv 2025 - -
OmniReason arXiv
OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving
arXiv 2025 - -

2️⃣ Numerical Action Generator

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
LMDrive arXiv
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
CVPR 2024 Website GitHub
LINGO-2 -
LINGO-2: Driving with Natural Language
2024 Website -
CoVLA-Agent arXiv
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
WACV 2025 Website -
ORION arXiv
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
ICCV 2025 Website GitHub
AutoVLA arXiv
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
NeurIPS 2025 Website GitHub
SimLingo arXiv
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
CVPR 2025 Website GitHub
DriveGPT4-V2 CVPR
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
CVPR 2025 - -
S4-Driver arXiv
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
CVPR 2025 Website -
VaViM arXiv
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
arXiv 2025 Website GitHub
BEVDriver arXiv
BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving
arXiv 2025 - -
DriveMoE arXiv
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
arXiv 2025 Website GitHub
DSDrive arXiv
DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning
arXiv 2025 - -
OpenDriveVLA arXiv
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
arXiv 2025 Website GitHub
PLA arXiv
A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving
arXiv 2025 - -
OccVLA arXiv
OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision
arXiv 2025 - -

3️⃣ Explicit Action Guidance

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
DriveVLM arXiv
DriveVLM: The convergence of autonomous driving and large vision-language models
CoRL 2024 Website -
DualAD arXiv
DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving
IROS 2025 Website GitHub
FasionAD arXiv
FASIONAD : FAst and Slow FusION Thinking Systems for Human-Like Autonomous Driving with Adaptive Feedback
arXiv 2024 - -
Senna arXiv
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
arXiv 2024 - GitHub
LeapAD arXiv
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
NeurIPS 2024 Website GitHub
DME-Driver arXiv
DME-Driver: Integrating human decision logic and 3D scene perception in autonomous driving
AAAI 2025 - -
SOLVE arXiv
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
CVPR 2025 - -
FasionAD++ arXiv
FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback
arXiv 2025 - -
LeapVAD arXiv
LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking
arXiv 2025 - -
DiffVLA arXiv
DiffVLA: Vision-language guided diffusion planning for autonomous driving
arXiv 2025 - -
ReAL-AD arXiv
ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving
arXiv 2025 Website -

4️⃣ Implicit Representations Transfer

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
VLP arXiv
VLP: Vision Language Planning for Autonomous Driving
CVPR 2024 - -
VLM-AD arXiv
VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision
CoRL 2025 - -
DiMA arXiv
Distilling Multi-modal Large Language Models for Autonomous Driving
CVPR 2025 - -
ALN-P3 arXiv
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving
arXiv 2025 - -
VERDI arXiv
VERDI: VLM-Embedded Reasoning for Autonomous Driving
arXiv 2025 - -
VLM-E2E arXiv
VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion
arXiv 2025 - -
ReCogDrive arXiv
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
arXiv 2025 Website GitHub
InsightDrive arXiv
InsightDrive: Insight Scene Representation for End-to-End Autonomous Driving
arXiv 2025 - GitHub
NetRoller arXiv
NetRoller: Interfacing General and Specialized Models for End-to-End Autonomous Driving
arXiv 2025 - GitHub
ETA arXiv
ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models
arXiv 2025 - GitHub
ViLaD arXiv
ViLaD: A Large Vision Language Diffusion Framework for End-to-End Autonomous Driving
arXiv 2025 - -

3. Datasets & Benchmarks

⏲️ In chronological order, from the earliest to the latest.

1️⃣ Vision-Action Datasets

Dataset Paper Venue Website GitHub
BDD100K arXiv
BDD100K: A diverse driving dataset for heterogeneous multitask learning
CVPR 2020 Website GitHub
nuScenes arXiv
nuScenes: A multimodal dataset for autonomous driving
CVPR 2020 Website -
Waymo arXiv
Scalability in perception for autonomous driving: Waymo open dataset
CVPR 2020 Website GitHub
nuPlan arXiv
nuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles
arXiv 2021 Website GitHub
Argoverse 2 arXiv
Argoverse 2: Next generation datasets for self-driving perception and forecasting
NeurIPS 2021 Website GitHub
Bench2Drive arXiv
Bench2Drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving
NeurIPS 2024 - GitHub
RoboBEV arXiv
Benchmarking and improving bird's eye view perception robustness in autonomous driving
TPAMI 2025 - GitHub

2️⃣ Vision-Language-Action Datasets

Dataset Paper Venue Website GitHub
BDD-X arXiv
Textual explanations for self-driving vehicles
ECCV 2018 - GitHub
Talk2Car IEEE
Talk2Car: Predicting physical trajectories for natural language commands
IEEE Access 2022 - GitHub
SDN arXiv
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents
EMNLP 2022 - GitHub
DriveMLM arXiv
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
arXiv 2023 - GitHub
LMDrive arXiv
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
CVPR 2024 Website GitHub
DriveLM-nuScenes arXiv
DriveLM: Driving with graph visual question answering
ECCV 2024 Website GitHub
DriveLM-CARLA arXiv
DriveLM: Driving with graph visual question answering
ECCV 2024 Website GitHub
HBD arXiv
DME-Driver: Integrating human decision logic and 3D scene perception in autonomous driving
AAAI 2025 - -
VLAAD WACVW
VLAAD: Vision and Language Assistant for Autonomous Driving
WACVW 2024 - GitHub
SUP-AD arXiv
DriveVLM: The convergence of autonomous driving and large vision-language models
CoRL 2024 Website -
NuInstruct arXiv
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
CVPR 2024 - GitHub
WOMD-Reasoning arXiv
WOMD-Reasoning: A large-scale dataset for interaction reasoning in driving
ICML 2025 Website GitHub
DriveCoT arXiv
DriveCoT: Integrating chain-of-thought reasoning with end-to-end driving
arXiv 2024 Website -
Reason2Drive arXiv
Reason2Drive: Towards interpretable and chain-based reasoning for autonomous driving
ECCV 2024 - GitHub
DriveBench arXiv
Are VLMs ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives
ICCV 2025 Website GitHub
MetaAD arXiv
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
arXiv 2025 Website GitHub
OmniDrive arXiv
OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
CVPR 2025 - GitHub
NuInteract arXiv
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
arXiv 2025 - -
DriveAction arXiv
DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models
arXiv 2025 - -
ImpromptuVLA arXiv
Impromptu VLA: Open weights and open data for driving vision-language-action models
arXiv 2025 Website GitHub
CoVLA arXiv
CoVLA: Comprehensive vision-language-action dataset for autonomous driving
WACV 2025 Website -
OmniReason-nuScenes arXiv
OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving
arXiv 2025 - -
OmniReason-B2D arXiv
OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving
arXiv 2025 - -

4. Applications

5. Other Resources

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages