Skip to content
/ SAFE Public

About This is the official repository for "SAFE: Multitask Failure Detection for Vision-Language-Action Models"

Notifications You must be signed in to change notification settings

vla-safe/SAFE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAFE: Multitask Failure Detection for Vision-Language-Action Models

Preprint

Project Page | Paper | ArXiv

Qiao Gu, Yuanliang Ju, Shengxiang Sun, Igor Gilitschenski, Haruki Nishimura, Masha Itkina, Florian Shkurti

Splash Figure

We introduce the multitask failure detection problem for VLA models, and propose SAFE, a failure detector that can detect failures for unseen tasks zero-shot and achieve state-of-the-art performance. This repo contains the implementation of SAFE.

Generate rollouts from VLA models

Please follow the following repo for adapted code that runs VLA models on simulated environments and generates rollouts for failure detection. Detailed instructions can be found in the README files of these repos.

  • openvla for OpenVLA model on the LIBERO benchmark.
  • openpi for pi0 and pi0-FAST models on the LIBERO benchmark.
  • open-pi-zero for pi0* models on the SimplerEnv benchmark.

After generating the rollouts, please duplicate setup_envs.bash.template and edit environment variables inside according to the locations of the generated rollouts.

cp setup_envs.bash.template setup_envs.bash

# TODO: Please edit the setup_envs.bash file to set the environment variables

Train and evaluate SAFE and baseline failure detectors

Setup

git clone git@github.com:vla-safe/SAFE.git

# Create a new conda environment (or other virtual environment management tool)
conda create -n vla-safe python=3.10
conda activate vla-safe

# Install pytorch (the newest version should be fine)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# Install other required packages
pip install pandas scipy pyyaml tqdm imageio[ffmpeg] hydra-core omegaconf scikit-learn opencv_python einops wandb plotly matplotlib natsort flask

# Log in your wandb account
wandb login

# Install this codebase as a package
# cd to the root directory of this repo
pip install -e .

Training and evaluation

Please see the following file for training and evaluation scripts for SAFE failure detector and all baselines.

Aggregate and plot metrics

The script scripts/get_wandb_metrics.py pulls the evaluation metrics from WandB, aggregates them, and saves them to CSV files, which should reproduce the results in Table 1 of the paper. You can run the script as follows:

python scripts/get_wandb_metrics.py

Other useful scripts are as follows:

# To generate plots as shown in Figure 1 and Figure 7
python scripts/visualize_features.py

# To generate plots as shown in Figure 8
python scripts/eval_conformal_figure.py

Acknowledgements

The SAFE project and this codebase are inspired by and built on the following repos:

Reference

Please cite our work if you find it useful:

@article{gu2025safe,
  title={SAFE: Multitask Failure Detection for Vision-Language-Action Models},
  author={Gu, Qiao and Ju, Yuanliang and Sun, Shengxiang and Gilitschenski, Igor and Nishimura, Haruki and Itkina, Masha and Shkurti, Florian},
  journal={arXiv preprint arXiv:2506.09937},
  year={2025}
}

About

About This is the official repository for "SAFE: Multitask Failure Detection for Vision-Language-Action Models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published