English | 简体中文
Are VLMs Ready for Autonomous Driving?
An Empirical Study from the Reliability, Data, and Metric Perspectives
Shaoyuan Xie1
Lingdong Kong2,3
Yuhao Dong2,4
Chonghao Sima2,5
Wenwei Zhang2
Qi Alfred Chen1
Ziwei Liu4
Liang Pan2
1UC Irvine
2Shanghai AI Laboratory
3NUS
4NTU
5HKU
![]() |
|---|
- This work introduces 🚙 DriveBench, a benchmark dataset designed to evaluate VLM reliability across 17 settings (clean, corrupted, and text-only inputs), encompassing 19,200 frames, 20,498 question-answer pairs, three question types, four mainstream driving tasks, and a total of 12 popular VLMs.
- Our findings reveal that VLMs often generate plausible responses derived from general knowledge or textual cues rather than true visual grounding, especially under degraded or missing visual inputs. This behavior, concealed by dataset imbalances and insufficient evaluation metrics, poses significant risks in safety-critical scenarios like autonomous driving.
If you find this work helpful for your research, please kindly consider citing our paper:
@InProceedings{xie2025drivebench,
author = {Xie, Shaoyuan and Kong, Lingdong and Dong, Yuhao and Sima, Chonghao and Zhang, Wenwei and Chen, Qi Alfred and Liu, Ziwei and Pan, Liang},
title = {Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {6585-6597}
}
- [2025.07] - The DriveBench dataset has been extended to Track 1: Driving with Language of the RoboSense Challenge at IROS 2025. See the track homepage and GitHub repo for more details.
- [2025.06] - Our paper has been accepted to ICCV 2025. See you in Honolulu! 🌸
- [2025.04] - We are hosting the 2025 RoboSense Challenge! Visit the competition homepage for details and participation. 🏁
- [2025.01] - The evaluation data can be accessed at our HuggingFace Dataset Card. 🤗
- [2025.01] - Introducing the 🚙 DriveBench project! For more details, kindly refer to our Project Page and Preprint. 🚀
- Benchmark Comparison
- Installation
- Data Preparation
- Getting Started
- Benchmark Results
- License
- Acknowledgments
For details related to installation and environment setups, kindly refer to INSTALL.md.
Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.
To learn more usage about this codebase, kindly refer to GET_STARTED.md.
Commercial VLMs
Open-Source VLMs
Specialist VLMs
This work is under the Apache License Version 2.0, while some specific implementations in this codebase might be with other licenses. Kindly refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.
To be updated.


