GitHub - alibaba-damo-academy/RynnVLA-001: RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📃 Tech Report | 📑 Blog | 🤗 Hugging Face | 🤖 ModelScope
🖥️ Demo Video (Youtube) ｜ 🖥️ Demo Video (Bilibili)

RynnVLA-001-demo-video.mp4

📰 News

[2025.09.18] 🔥🔥 Release our technical report.
[2025.08.08] 🔥🔥 Release our pretrained models and training code.

🌟 Introduction

RynnVLA-001 is a VLA model based on pretrained video generation model. The key insight is to implicitly transfer manipulation skills learned from human demonstrations in ego-centric videos to the manipulation of robot arms.

We finetune the baseline on the same dataset to evaluate the performance. The comparison results are shown in the following figure.

🛠️ Requirements and Installation

Install required packages:

pip install torch==2.2.0 torchvision==0.17.0 --index-url https://download.pytorch.org/whl/cu121

pip install -r requirements.txt

pip install flash-attn==2.5.8

🌍 Model Zoo

Model	Stage	Initial Weights	Model Links
RynnVLA-001-7B-Base	Ego-Centric Video Generative Pretraining	Chemeleon-7B	HuggingFace / ModelScope
RynnVLA-001-7B-Trajectory	Trajectory-Aware Video Modeling	RynnVLA-001-7B-Base	HuggingFace / ModelScope

Both of the models can be used as the pretrained model for VLA finetuning (Stage 3). Our full model is initialized with RynnVLA-001-7B-Trajectory.

🗝️ Training

The training pipeline are shown as follows:

Here we provide instructions on how to finetune the model with your own LeRobot data （ActionVAE and Stage 3). We will release instructions on how to train models from scratch. Stay tuned!

Step 1: Prepare Pretrained Models

Download Chameleon Model and pretrained RynnVLA-001-7B-Trajectory models, and put the downloaded model under pretrained_models. The structure of the folder pretrained_models should be:

pretrained_models
├── Chameleon
│   ├── original_tokenizers
│   │   ├── text_tokenizer.json
│   │   ├── vqgan.ckpt
│   │   └── vqgan.yaml
│   ├── config.json
│   └── ...
└── RynnVLA-001-7B-Trajectory

Step 2: Prepare Training Data

If you have your own LeRobot data, please convert your LeRobot data into the hdf5 format. Here, we provide the conversion scripts. To execute the conversion successfully, we recommend you to install a seperate environment as suggested in LeRobot repo.

python misc/lerobot_data_convert.py --dataset_dir path-to-raw-lerobot-data --task_name dataset-name --save_dir path-to-save-hdf5-files

After the data conversion, you need to save the statistics and paths of all your data into a json file. You can use the following scripts to generate the json file. Before you run the scripts, please change the data path in the misc/merge_data/config.yaml.

cd misc/merge_data

python misc/data_process_with_configs.py -c misc/merge_data/config.yaml

Step 3: Prepare training script

Before you start training, please change the paths in ./configs/actionvae/actionvae_lerobot.yml and ./configs/lerobot/lerobot_exp.yml to corresponding local paths.

# ActionVAE
# Empirically, we train action_vae on our dataset for 300000 iterations with batch size of 16 * 8 (GPUs).
# You may visualize the reconstructed trajectory to check the quality.
bash scripts/actionvae/action_vae.sh

# Stage 3
bash scripts/lerobot/lerobot.sh

🤖 Inference

Here, we provide an example code for the inference on lerobot. You can adapt this code to interact with your robot arm for input and output.

Please refer to inference_lerobot.py for details.

You may need to upgrade the version of transformers to 4.46.3 if any error occurs.

📑 Citation

@article{jiang2025rynnvla001,
  title={RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation},
  author={Jiang, Yuming and Huang, Siteng and Xue, Shengke and Zhao, Yaxi and Cen, Jun and Leng, Sicong and Li, Kehan and Guo, Jiayan and Wang, Kexiang and Chen, Mingxiu and Wang, Fan and Zhao, Deli and Li, Xin},
  journal={arXiv preprint arXiv:2509.15212},
  year={2025}
}

👍 Acknowledgement

The codebase of our RynnVLA-001 is refactored from Lumina-mGPT and Chameleon. If your work is used in RynnVLA-001 but not mentioned in either this repo or the technical report, feel free to let us know ❤️.

💡 Other featured projects from our RynnBot family ✨.

RynnEC: Bringing MLLMs into Embodied World
Ronghao Dang*, Yuqian Yuan*, Yunxuan Mao*, Kehan Li*, Jiangpin Liu, Zhikai Wang, Fan Wang, Deli Zhao, Xin Li

RynnRCP: Open Robotics Context Protocol and RobotMotion
RynnBot Team

WorldVLA: Towards Autoregressive Action World Model
Jun Cen, Chaohui Yu, Hangjie Yuan, Yuming Jiang, Siteng Huang, Jiayan Guo, Xin Li, Yibing Song, Hao Luo, Fan Wang, Deli Zhao, Hao Chen

🔒 License

This project is released under the Apache 2.0 license as found in the LICENSE file. The service is a research preview intended for non-commercial use ONLY. Please get in touch with us if you find any potential violations.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
configs		configs
datasets		datasets
inferencer		inferencer
misc		misc
models		models
pretrained_models		pretrained_models
scripts		scripts
trainer		trainer
utils		utils
LICENSE		LICENSE
README.md		README.md
inference_lerobot.py		inference_lerobot.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📰 News

🌟 Introduction

🛠️ Requirements and Installation

🌍 Model Zoo

🗝️ Training

Step 1: Prepare Pretrained Models

Step 2: Prepare Training Data

Step 3: Prepare training script

🤖 Inference

👍 Acknowledgement

🔒 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

alibaba-damo-academy/RynnVLA-001

Folders and files

Latest commit

History

Repository files navigation

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📰 News

🌟 Introduction

🛠️ Requirements and Installation

🌍 Model Zoo

🗝️ Training

Step 1: Prepare Pretrained Models

Step 2: Prepare Training Data

Step 3: Prepare training script

🤖 Inference

👍 Acknowledgement

🔒 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages