📃 Tech Blog | 🤗 Hugging Face | 🤖 ModelScope
🖥️ Demo Video (Youtube) | 🖥️ Demo Video (Bilibili)
RynnVLA-001_demo.mp4
- [2025.08.08] 🔥🔥 Release our pretrained models and training code.
RynnVLA-001 is a VLA model based on pretrained video generation model. The key insight is to implicitly transfer manipulation skills learned from human demonstrations in ego-centric videos to the manipulation of robot arms.
We finetune the baseline on the same dataset to evaluate the performance. The comparison results are shown in the following figure.
Install required packages:
pip install torch==2.2.0 torchvision==0.17.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install flash-attn==2.5.8The training pipeline are shown as follows:
Here we provide instructions on how to finetune the model with your own LeRobot data (Stage 2 and Stage 3). We will release instructions on how to train models from scratch. Stay tuned!
Download Chameleon Model and pretrained RynnVLA-001-7B-Base models, and put the downloaded model under pretrained_models. The structure of the folder pretrained_models should be:
pretrained_models
├── Chameleon
│ ├── original_tokenizers
│ │ ├── text_tokenizer.json
│ │ ├── vqgan.ckpt
│ │ └── vqgan.yaml
│ ├── config.json
│ └── ...
└── RynnVLA-001-7B-BaseIf you have your own LeRobot data, please convert your LeRobot data into the hdf5 format. Here, we provide the conversion scripts. To execute the conversion successfully, we recommend you to install a seperate environment as suggested in LeRobot repo.
python misc/lerobot_data_convert.py --dataset_dir path-to-raw-lerobot-data --task_name dataset-name --save_dir path-to-save-hdf5-filesAfter the data conversion, you need to save the statistics and paths of all your data into a json file. You can use the following scripts to generate the json file. Before you run the scripts, please change the data path in the misc/merge_data/config.yaml.
cd misc/merge_data
python misc/data_process_with_configs.py -c misc/merge_data/config.yamlBefore you start training, please change the paths in ./configs/actionvae/actionvae_lerobot.yml and ./configs/lerobot/lerobot_exp.yml to corresponding local paths.
# Stage 2
# Empirically, we train action_vae on our dataset for 30000 iterations with batch size of 16 * 8 (GPUs).
# You may visualize the reconstructed trajectory to check the quality.
bash scripts/actionvae/action_vae.sh
# Stage 3
bash scripts/lerobot/lerobot.shHere, we provide an example code for the inference on lerobot. You can adapt this code to interact with your robot arm for input and output.
Please refer to inference_lerobot.py for details.
You may need to upgrade the version of transformers to 4.46.3 if any error occurs.
The codebase of our RynnVLA-001 is refactored from Lumina-mGPT and Chameleon. If your work is used in RynnVLA-001 but not mentioned in either this repo or the technical report, feel free to let us know ❤️.
This project is released under the Apache 2.0 license as found in the LICENSE file. The service is a research preview intended for non-commercial use ONLY. Please get in touch with us if you find any potential violations.



