VCBench: Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

This is the official repo for VCBench, a comprehensive benchmark designed for assessing multimodal mathematical reasoning tasks with explicit visual dependencies.

🔥 Update

[2025-04-10]: 🚀 Paper, Codes and Datas of VCBench online. Check out this link for details.

🎯 Overview

Dataset

The VCBench dataset consists of 1720 question answer pairs $(Q,A^*)$ and 6697 images.

The question-answer pairs and corresponding images can be found here.

Experiment

we assessed 24 state-of-the-art LVLMs across 17 distinct task categories within VCBench, evaluates five distinct model competencies: temporal reasoning, geometric reasoning, logical reasoning, spatial reasoning, and pattern recognition.

Despite achieving near-perfect accuracy on normal human-level performance, the best-performing visual models were unable to exceed 50% accuracy. This underscores the significant challenges that remain in the integration of visual and mathematical reasoning at the elementary level and highlights the need for further research in developing models that can handle the complexities of multi-modal, visually dependent reasoning tasks.

🤖 Automatic Evaluation

For model evaluation, please refer to evaluation.

🏆 Leaderboard

The Leaderboard for VCBench is continuously being updated, welcoming the contribution of your LVLMs!

Please note that to thoroughly evaluate your own LVLM, you are required to provide us with jsonl file. These should include the question-id and your final response. We have provided a submission format in the submit.jsonl file. After completing the aforementioned steps, please contact us via gasolsun36@gmail.com to submit your results and to update the leaderboard.

📧 Contact

📝 Citation

If you find our work helpful for your research, please consider giving a star and citation.

@misc{wong2025vcbench
  author    = {Zhikai Wang and Jiashuo Sun and Wenqi Zhang and Zhiqiang Hu and Xin Li and Fan Wang and Deli Zhao},
  title     = {Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency},
  year      = {2025},
  eprint    = {2504.18589},
  archivePrefix = {arxiv},
  primaryClass  = {cs.CV},
  url       = {https://arxiv.org/abs/2504.18589}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
evaluation		evaluation
static		static
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
nl_predictions.jsonl		nl_predictions.jsonl
submit.jsonl		submit.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VCBench: Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

🔥 Update

🎯 Overview

Dataset

Experiment

🤖 Automatic Evaluation

🏆 Leaderboard

📧 Contact

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

alibaba-damo-academy/VCBench

Folders and files

Latest commit

History

Repository files navigation

VCBench: Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

🔥 Update

🎯 Overview

Dataset

Experiment

🤖 Automatic Evaluation

🏆 Leaderboard

📧 Contact

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages