Overview

🌐 Website • 🤗 Hugging Face • ⏬ Data • 📃 Paper
中文 | English

Chinese SimpleVQA is the first factuality-based visual question-answering benchmark in Chinese, aimed at assessing the visual factuality of LVLMs across 8 major topics and 56 subtopics. The key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers.

Please visit our website or check our paper for more details.

This is the evaluation repository for Chinese SimpleVQA, and it follows the MIT License.

💫 Introduction

To comprehensively assess the factual knowledge of LVLMs, we present a ChineseSimpleVQA benchmark, which consists of a dataset containing 2,200 high-quality questions across 56 topics, spanning from the humanities to science and engineering. Specifically, the key distinguishing features of our proposed ChineseSimpleVQA are as follows:
- Multi-hop: Visual factuality inquiries are decomposed into two steps: object recognition and knowledge assessment. This multi-hop strategy allows us to analyze the capability boundaries and execution mechanisms of LVLMs.
- 🍀Diverse: ChineseSimpleVQA emphasizes the Chinese language and covers 8 major topics (i.e., Nature, Sciences, Engineering, Humanities & Society, modern Architecture, Ancient Architecture, Geography Meteorological and Life Culture & Art). These topics encompass 56 fine-grained subtopics.
- ⚡High-quality: We implement a rigorous pipeline for the benchmark construction, including automatic verification, difficulty filtering, and human verification.
- 💡Static: To maintain the enduring quality of ChineseSimpleVQA, all reference answers will remain unchanged over time.
- 🗂️Easy-to-evaluate: All of the questions and answers are in a short format for quick evaluation.

Based on Chinese SimpleVQA, we have conducted a comprehensive evaluation of the factual capabilities of existing 34 LVLMs. We also maintain a comprehensive leaderboard list.

📊 Leaderboard

🛠️ Setup

For the OpenAI API:

pip install openai

For datasets, we provide two version, you can either use the datasets with url image in /data/ChineseSimpleVQA.jsonl or download the datasets.

⚖️ Evals

We provide a simple single evaluation script that we wrote from scratch. The startup command is as follows:

(1): For closedsource LVLMs:

Step1: set your openai key in judge/closedsource_eval.py or judge/oepnsource_eval.py:

os.environ["OPENAI_API_KEY"] = "replace your key here"
os.environ["OPENAI_BASE_URL"] = "replace your key here"

Step2: run the eval script:

(1): For closedsource LVLMs:

python judge/closedsource_eval.py <model_name>.

(2): For oepnsource LVLMs:

python judge/oepnsource_eval.py <model_name>.

the structure of input data should be transfered like this format with model_output:

{"ID": "...", 
"image_url": "...", 
"recognition_question": "...", 
"recognition_answer": "...", 
"final_question": "...", 
"final_answer": "...", 
"Topic": "...", 
"model_output1": "...", 
"model_output2": "..."
}

Citation

Please cite our paper if you use our dataset.

@article{gu2025see,
  title={" See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models},
  author={Gu, Jihao and Wang, Yingyao and Bu, Pi and Wang, Chen and Wang, Ziming and Song, Tengtao and Wei, Donglai and Yuan, Jiale and Zhao, Yingxiu and He, Yancheng and others},
  journal={arXiv preprint arXiv:2502.11718},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
image		image
judge		judge
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

💫 Introduction

📊 Leaderboard

🛠️ Setup

⚖️ Evals

Citation

About

Uh oh!

Releases

Packages

Languages

OpenStellarTeam/ChineseSimpleVQA

Folders and files

Latest commit

History

Repository files navigation

Overview

💫 Introduction

📊 Leaderboard

🛠️ Setup

⚖️ Evals

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages