GitHub - H-EmbodVis/HERMES: [ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

HERMES: A Unified Self-Driving World Model for Simultaneous
3D Scene Understanding and Generation

Xin Zhou^1*, Dingkang Liang^1*†, Sifan Tu¹, Xiwu Chen³, Yikang Ding^2†, Dingyuan Zhang¹, Feiyang Tan³,
Hengshuang Zhao⁴, Xiang Bai¹

¹ Huazhong University of Science & Technology, ² MEGVII Technology,
³ Mach Drive, ⁴ The University of Hong Kong

(*) Equal contribution. (†) Project leader.

Check our awesome for the latest World Models!

📣 News

[2025.07.14] Code, pretrained weights, and used processed data are now open-sourced. 🔥
[2025.06.26] HERMES is accepted to ICCV 2025! 🥳
[2025.01.24] Release the demo. Check it out and give it a star 🌟!
[2025.01.24] Release the paper. 🔥

Abstract

Driving World Models (DWMs) have become essential for autonomous driving by enabling future scene prediction. However, existing DWMs are limited to scene generation and fail to incorporate scene understanding, which involves interpreting and reasoning about the driving environment. In this paper, we present a unified Driving World Model named HERMES. Through a unified framework, we seamlessly integrate scene understanding and future scene evolution (generation) in driving scenarios. Specifically, HERMES leverages a Bird‘s-Eye View (BEV) representation to consolidate multi-view spatial information while preserving geometric relationships and interactions. Additionally, we introduce world queries, which incorporate world knowledge into BEV features via causal attention in the Large Language Model (LLM), enabling contextual enrichment for both understanding and generation tasks. We conduct comprehensive studies on nuScenes and OmniDrive-nuScenes datasets to validate the effectiveness of our method. HERMES achieves state-of-the-art performance, reducing generation error by 32.4% and improving understanding metrics such as CIDEr by 8.0%.

Overview

Getting Started

We provide detailed guides to help you quickly set up, train, and evaluate HERMES:

Environment Setup: Step-by-step instructions for installing dependencies and preparing your environment.
Data & Weights Preparation: How to prepare the datasets and download pretrained weights.
Usage Guide: Instructions for training, inference, and evaluation.

Please follow these guides for a smooth experience.

Demo

Example 1

Example 2

Example 3

Main Results

To Do

Acknowledgement

This project is based on BEVFormer v2 (paper, code), InternVL (paper, code), UniPAD (paper, code), OminiDrive (paper, code), DriveMonkey (paper, code). Thanks for their wonderful works.

Citation

If you find this repository useful in your research, please consider giving a star ⭐ and a citation.

@inproceedings{zhou2025hermes,
  title={HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation},
  author={Zhou, Xin and Liang, Dingkang and Tu, Sifan and Chen, Xiwu and Ding, Yikang and Zhang, Dingyuan and Tan, Feiyang and Zhao, Hengshuang and Bai, Xiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
docs		docs
extra_tools		extra_tools
figures		figures
mmdet3d		mmdet3d
projects		projects
requirements		requirements
third_lib		third_lib
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
requirements_internvl.txt		requirements_internvl.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HERMES: A Unified Self-Driving World Model for Simultaneous
3D Scene Understanding and Generation

📣 News

Abstract

Overview

Getting Started

Demo

Main Results

To Do

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

License

H-EmbodVis/HERMES

Folders and files

Latest commit

History

Repository files navigation

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

📣 News

Abstract

Overview

Getting Started

Demo

Main Results

To Do

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

HERMES: A Unified Self-Driving World Model for Simultaneous
3D Scene Understanding and Generation

Packages