Skip to content

VisionXLab/SpaCE-10

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence

Ziyang Gong1*, Wenhao Li2*, Oliver Ma3, Songyuan Li4, Jiayi Ji5, Xue Yang1, Gen Luo3, Junchi Yan1, Rongrong Ji2

1 Shanghai Jiao Tong University, 2 Xiamen University,
3 Shanghai AI Lab, 4 Sun Yat-sen University, 5 National University of Singapore

* Equal contribution


🧠 What is SpaCE-10?

SpaCE-10 is a compositional spatial intellegence benchmark for evaluating Multimodal Large Language Models (MLLMs) in indoor environments. Our contribution as follows:

  • 🧬 We define an Atomic Capability Pool, proposing 10 atomic spatial capabilities.
  • 🔗 Based on the composition of different atomic capabilities, we design 8 compositional QA types.
  • 📈 SpaCE-10 benchmark contains 5,000+ QA pairs.
  • 🏠 All QA pairs come from 811 indoor scenes (ScanNet++, ScanNet, 3RScan, ARKitScene)
  • 🌍 SpaCE-10 spans both 2D and 3D MLLM evaluations and can be seamlessly adapted to MLLMs that accept 3D scan input.





🔥🔥🔥 News

  • [2025/07/12] Adjust some QAs of Space-10 and update RemyxAI models' performance to leader board.
  • [2025/06/11] Scans for 3D MLLMs and our manually collected 3D snapshots will be coming soon.
  • [2025/06/10] Evaluation code is released at followings.
  • [2025/06/09] We have released the benchmark for 2D MLLMs at Hugging Face.
  • [2025/06/09] The paper of SpaCE-10 is released at Arxiv!

Performance Leader Board - Single-Choice

🎉 LLaVA-OneVision-72B achieves the Rank 1 in all tested models.

🎉 GPT-4o achieves the best score in tested Close-Source models.

A large gap still exists between human and models in compositional spatial intelligence.


Single-Choice vs. Double-Choice


Capability Score Ranking - Single-Choice


Environment

The evaluation of SpaCE-10 is based on lmms-eval. Thus, we follow the environment settings of lmms-eval.

git clone https://github.com/Cuzyoung/SpaCE-10.git
cd SpaCE-10
uv venv dev --python=3.10
source dev/bin/activate
uv pip install -e .

Evaluation

Take InternVL2.5-8B as an example:

cd lmms-eval/run_bash
bash internvl2.5-8b.sh

Notably, each time we test a new model, the corresponding environment of this model needs to be installed.

Citation

@article{gong2025space10, title={SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence}, author={Ziyang Gong, Wenhao Li, Oliver Ma, Songyuan Li, Jiayi Ji, Xue Yang, Gen Luo, Junchi Yan, Rongrong Ji}, journal={arXiv preprint arXiv:2506.07966}, year={2025} }

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published