Awesome Vision-Language-Action (VLA) Models

This is a collection of research papers about Embodied Multimodal Large Language Models (VLA models).

If you would like to include your paper or update any details (e.g., code URLs, conference information), please feel free to submit a pull request. Any advice is also welcome!

📑 Table of Contents

Awesome VLA Models
Awesome Papers
Awesome Datasets and Benchmarks

Awesome VLA Models

🔥🔥🔥 RT-2: Robotics Transformer 2 — End-to-End Vision-Language-Action Model

📖 Paper | 🌟 Project

Integrates vision-language models trained on internet-scale data directly into robotic control pipelines. ✨

🔥🔥🔥 Helix: Generalist VLA Model for Full-Body Humanoid Control

🌟 Project

First VLA model achieving full upper-body humanoid control including fingers, wrists, torso, and head. ✨

🔥🔥🔥 π0 (Pi-Zero): Generalist VLA Across Diverse Robots

🌟 Project

Generalist control across various robot embodiments, utilizing large-scale pretraining and flow matching action generation. ✨

🔥🔥🔥 OpenVLA: Open-Source Large-Scale Vision-Language-Action Model

📖 Paper | 🌟 Project | 🤖 Hugging Face

Pretrained on 970k+ robotic episodes, setting a new benchmark for generalist robotic policies. ✨

🔥🔥🔥 Gemini Robotics: Multimodal Generalization to Physical Action

🌟 Project

Built on Gemini 2.0, enabling complex real-world manipulation without task-specific training. ✨

Awesome Papers

Title	Introduction	Date	Code
PaLM-E: An Embodied Multimodal Language Model	Integrates perception, language, and action for embodied AI.	2023-03-06	-
EmbodiedGPT	Vision-language models with embodied CoT reasoning.	2023-05-24	Github
Co-LLM-Agents	Cooperative embodied agents via modular LLMs.	2023-07-05	Github
RT-2	Transfers VLM internet knowledge to robotic control.	2023-07-28	-
LLM as Policies	Apple: LLMs for embodied tasks as policies.	2023-10-26	Github
Embodied Generalist Agent 3D	Generalist agent in 3D worlds.	2023-11-18	Github
LL3DA	Omni-3D understanding via instruction tuning.	2023-11-30	Github
NaviLLM	Generalist navigation models.	2023-12-04	Github
MP5	Open-ended embodied agent in Minecraft.	2023-12-12	Github
ManipLLM	Object-centric robotic manipulation via LLMs.	2023-12-24	Github
MultiPLY	Multisensory 3D embodied LLMs.	2024-01-16	Github
NaVid	Next-step planning in navigation.	2024-02-24	-
ShapeLLM	3D object understanding for embodied agents.	2024-02-27	Github
3D-VLA	Generative 3D world model for VLA learning.	2024-03-14	Github
RoboMP²	Multimodal robotic perception-planning.	2024-04-07	-
Helix	Full-body humanoid control model.	2024-04	Project
Embodied CoT Distillation	Distilling embodied CoT into agents.	2024-05-02	-
Gemini Robotics	Real-world manipulation by Gemini.	2024-05	Project
A3VLM	Actionable articulation-aware VLMs.	2024-06-11	Github
OpenVLA	Open-sourced 7B vision-language-action model.	2024-06-13	Github
TinyVLA	Compact and efficient VLA models.	2024-09	Paper
VLA Expert Collaboration	Improves VLA via expert actions.	2025-03	Paper

Awesome Datasets and Benchmarks

🏗️ Scene / Environment Generation

Title	Introduction	Date	Code
Holodeck	LLMs generate interactive 3D simulation environments.	2023-12-14	Github
PhyScene	Physically interactive 3D scenes for embodied training.	2024-04-15	Github

🧠 Question Answering / Language Interaction Benchmarks

Title	Introduction	Date	Code
OpenEQA	Visual QA benchmark for real-world scenes.	2024-06-17	Github
EQA-REAL	Real-world EmbodiedQA for indoor settings.	2024-04	Github
TEACh	Human-human embodied task dialogues.	2023 update (original 2021)	Github

👀 Multi-Modal Perception Datasets

Title	Introduction	Date	Code
EmbodiedScan	Real-world RGB-D + language 3D scans.	2023-12-26	Github

🕹️ End-to-End Embodied Decision Making / Simulators

Title	Introduction	Date	Code
PCA-EVAL	Decision-making via GPT-4V evaluation.	2023-10-03	Github
UniSim	Interactive real-world simulator learning.	2023-10-09	-

🏡 Household Activities / Task Benchmarks

Title	Introduction	Date	Code
BEHAVIOR-1K	1,000 household activity programs and scenes.	2023-07-11	Project

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Vision-Language-Action (VLA) Models

📑 Table of Contents

Awesome VLA Models

Awesome Papers

Awesome Datasets and Benchmarks

🏗️ Scene / Environment Generation

🧠 Question Answering / Language Interaction Benchmarks

👀 Multi-Modal Perception Datasets

🕹️ End-to-End Embodied Decision Making / Simulators

🏡 Household Activities / Task Benchmarks

About

Uh oh!

Releases

Packages

Orlando-CS/Awesome-VLA

Folders and files

Latest commit

History

Repository files navigation

Awesome Vision-Language-Action (VLA) Models

📑 Table of Contents

Awesome VLA Models

Awesome Papers

Awesome Datasets and Benchmarks

🏗️ Scene / Environment Generation

🧠 Question Answering / Language Interaction Benchmarks

👀 Multi-Modal Perception Datasets

🕹️ End-to-End Embodied Decision Making / Simulators

🏡 Household Activities / Task Benchmarks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages