SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
Ziyang Gong1*, Wenhao Li2*, Oliver Ma3, Songyuan Li4, Jiayi Ji5, Xue Yang1, Gen Luo3, Junchi Yan1, Rongrong Ji2
1 Shanghai Jiao Tong University,
2 Xiamen University,
3 Shanghai AI Lab,
4 Sun Yat-sen University,
5 National University of Singapore
* Equal contribution
SpaCE-10 is a compositional spatial intellegence benchmark for evaluating Multimodal Large Language Models (MLLMs) in indoor environments. Our contribution as follows:
- 🧬 We define an Atomic Capability Pool, proposing 10 atomic spatial capabilities.
- 🔗 Based on the composition of different atomic capabilities, we design 8 compositional QA types.
- 📈 SpaCE-10 benchmark contains 5,000+ QA pairs.
- 🏠 All QA pairs come from 811 indoor scenes (ScanNet++, ScanNet, 3RScan, ARKitScene)
- 🌍 SpaCE-10 spans both 2D and 3D MLLM evaluations and can be seamlessly adapted to MLLMs that accept 3D scan input.
- [2025/07/12] Adjust some QAs of Space-10 and update RemyxAI models' performance to leader board.
- [2025/06/11] Scans for 3D MLLMs and our manually collected 3D snapshots will be coming soon.
- [2025/06/10] Evaluation code is released at followings.
- [2025/06/09] We have released the benchmark for 2D MLLMs at Hugging Face.
- [2025/06/09] The paper of SpaCE-10 is released at Arxiv!
🎉 LLaVA-OneVision-72B achieves the Rank 1 in all tested models.
🎉 GPT-4o achieves the best score in tested Close-Source models.
A large gap still exists between human and models in compositional spatial intelligence.
The evaluation of SpaCE-10 is based on lmms-eval. Thus, we follow the environment settings of lmms-eval.
git clone https://github.com/Cuzyoung/SpaCE-10.git
cd SpaCE-10
uv venv dev --python=3.10
source dev/bin/activate
uv pip install -e .
Take InternVL2.5-8B as an example:
cd lmms-eval/run_bash
bash internvl2.5-8b.sh
Notably, each time we test a new model, the corresponding environment of this model needs to be installed.
@article{gong2025space10, title={SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence}, author={Ziyang Gong, Wenhao Li, Oliver Ma, Songyuan Li, Jiayi Ji, Xue Yang, Gen Luo, Junchi Yan, Rongrong Ji}, journal={arXiv preprint arXiv:2506.07966}, year={2025} }