GitHub - porscheofficial/Drive4C: A Closed-Loop Benchmark on What Foundation Models Really Need to Be Capable of for Language-Guided Autonomous Driving

Drive4C: A Closed-Loop Benchmark on What Foundation Models Really Need to Be Capable of for Language-Guided Autonomous Driving

🧭 Overview

Drive4C is a capability-driven, closed-loop benchmark designed to evaluate multimodal large language models (MLLMs) in the context of language-guided autonomous driving. It decomposes the evaluation process into core understanding capabilities to identify specific model limitations and areas for targeted improvement.

📖 Abstract

Language-guided autonomous driving has emerged as a promising paradigm in autonomous systems development, leveraging the open-context description, reasoning, and interpretation capabilities of multimodal large language models (MLLMs). However, existing benchmarks only provide overall scores and fail to assess the core capabilities required for language-guided driving. They do not reveal why models struggle with autonomous navigation, limiting targeted improvements.

We present Drive4C, a novel closed-loop benchmark for systematically evaluating MLLMs based on four core capabilities derived from human driver requirements: semantic, spatial, temporal, and physical understanding. Drive4C separates the evaluation into scenario description, scenario anticipation, and language-guided motion, allowing for fine-grained capability evaluation. The two-step evaluation process of question-answering and instruction-based driving tasks ensures a modular and capability-specific performance analysis.

Experimental results show that state-of-the-art models perform well in semantic understanding and scenario anticipation, but struggle with spatial, temporal, and physical understanding, uncovering the potential for targeted model improvements.

📹 Demo Video

🔥 News

[04/25] Drive4C accepted at CVPR WDFM-AD 2025 🎉
[TBD] Benchmark code to be released soon

🧠 Evaluated Models

Model	SEM	SPA	TEM	PHY	ANT	LGM	Score
Dolphins	0.4241	0.0587	0.2182	0.0162	0.4720	0.0448	0.1413
Llama-3.2-11B-Vision	0.4820	0.1461	0.1994	0.0802	0.5769	0.0268	0.1619
Phi-4-Multimodal	0.7482	0.1959	0.2217	0.0367	0.4428	0.0388	0.1839
SmolVLM	0.7256	0.3153	0.2223	0.0813	0.5772	0.0186	0.2015
DriveMM	0.8059	0.2776	0.2937	0.0367	0.4376	0.0970	0.2337
Gemma 3-27B-it	0.8445	0.3076	0.2542	0.1726	0.6540	0.1049	0.2757
GPT-4o	0.8422	0.3587	0.3498	0.1703	0.6421	0.1298	0.3012

📝 Paper and Citation

If you find our work useful, please consider citing us!

@InProceedings{Sohn_2025_CVPR,
    author    = {Sohn, Tin Stribor and Dillitzer, Maximilian and Bach, Johannes and Corso, Jason J. and Br\"uhl, Tim and Schwager, Robin and Eberhardt, Tim Dieter and Sax, Eric},
    title     = {Drive4C: A Closed-Loop Benchmark on What Foundation Models Really Need to Be Capable of for Language-Guided Autonomous Driving},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {3859-3869}
}

@misc{sohn2025frameworkcapabilitydrivenevaluationscenario,
      title={A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving}, 
      author={Tin Stribor Sohn and Philipp Reis and Maximilian Dillitzer and Johannes Bach and Jason J. Corso and Eric Sax},
      year={2025},
      eprint={2503.11400},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.11400}, 
}

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Drive4C: A Closed-Loop Benchmark on What Foundation Models Really Need to Be Capable of for Language-Guided Autonomous Driving

🧭 Overview

📖 Abstract

📹 Demo Video

🔥 News

🧠 Evaluated Models

📝 Paper and Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

porscheofficial/Drive4C

Folders and files

Latest commit

History

Repository files navigation

Drive4C: A Closed-Loop Benchmark on What Foundation Models Really Need to Be Capable of for Language-Guided Autonomous Driving

🧭 Overview

📖 Abstract

📹 Demo Video

🔥 News

🧠 Evaluated Models

📝 Paper and Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages