[Project Page] [Paper] [Processed Datasets]
Fanqi Lin1,2,3,5*, Ruiqian Nai1,2,3,5*, Yingdong Hu1,2,3*, Jiacheng You1,2,3, Junming Zhao1,4, Yang Gao1,2,3,5
1Tsinghua University, 2Shanghai Qi Zhi Institute, 3Shanghai AI Lab, 4Fudan University, 5Spirit AI
* indicates equal contributions
We manage Python dependencies with uv. If you haven't installed uv
, please follow uv installation instructions to set it up.
Run the following to set up the environment:
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .
NOTE:
GIT_LFS_SKIP_SMUDGE=1
is needed to pull LeRobot as a dependency.
For more details, refer to the original openpi repository.
Download the dataset and place them under $LEROBOT_HOME/umi/
.
To train a OneTwoVLA model, run:
bash train_scripts/train_<task_name>.sh
Available tasks are:
train_scripts
|-- train_onetwovla_cocktail.sh
|-- train_onetwovla_visual_grounding.sh
|-- train_pi0_cocktail.sh
|-- train_pi0_visual_grounding.sh
We run inference using a policy server and a hardware client. The instructions for running policy server can be found at examples/umi/README.md, and we provide the UMI hardware client code in this repository.
We provide access to the following datasets:
Robot Datasets
: Datasets for thecocktail
andopen-world visual grounding
tasks.Vision-Language Datasets
: Datasets contains synthetic images and annotated reasoning for theopen-world visual grounding
task.
All datasets are hosted on Hugging Face. You can find them here.
We provide code for converting UMI data format to LeRobot data format here.
To make the synthetic images more closely resemble real robot observations, we randomly apply several augmentations, including random fisheye distortion and compositing a robot gripper with adaptive brightness adjustments. The implementation is available in scripts/augment_vl_data/augment.py.
Here we show an example. From left to right, the images are: the original image, the image with fisheye distortion, the image compositing a robot gripper with adaptive brightness adjustments, and the image with both applied.
We express our sincere gratitude to the developers of the openpi for open-sourcing their code.