Skip to content

OpenStellarTeam/ChineseSimpleVQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Overview

๐ŸŒ Website โ€ข ๐Ÿค— Hugging Face โ€ข โฌ Data โ€ข ๐Ÿ“ƒ Paper
ไธญๆ–‡ | English

Chinese SimpleVQA is the first factuality-based visual question-answering benchmark in Chinese, aimed at assessing the visual factuality of LVLMs across 8 major topics and 56 subtopics. The key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers.

Please visit our website or check our paper for more details.

This is the evaluation repository for Chinese SimpleVQA, and it follows the MIT License.

๐Ÿ’ซ Introduction

  • To comprehensively assess the factual knowledge of LVLMs, we present a ChineseSimpleVQA benchmark, which consists of a dataset containing 2,200 high-quality questions across 56 topics, spanning from the humanities to science and engineering. Specifically, the key distinguishing features of our proposed ChineseSimpleVQA are as follows:
    • Multi-hop: Visual factuality inquiries are decomposed into two steps: object recognition and knowledge assessment. This multi-hop strategy allows us to analyze the capability boundaries and execution mechanisms of LVLMs.
    • ๐Ÿ€Diverse: ChineseSimpleVQA emphasizes the Chinese language and covers 8 major topics (i.e., Nature, Sciences, Engineering, Humanities & Society, modern Architecture, Ancient Architecture, Geography Meteorological and Life Culture & Art). These topics encompass 56 fine-grained subtopics.
    • โšกHigh-quality: We implement a rigorous pipeline for the benchmark construction, including automatic verification, difficulty filtering, and human verification.
    • ๐Ÿ’กStatic: To maintain the enduring quality of ChineseSimpleVQA, all reference answers will remain unchanged over time.
    • ๐Ÿ—‚๏ธEasy-to-evaluate: All of the questions and answers are in a short format for quick evaluation.
  • Based on Chinese SimpleVQA, we have conducted a comprehensive evaluation of the factual capabilities of existing 34 LVLMs. We also maintain a comprehensive leaderboard list.

๐Ÿ“Š Leaderboard

๐Ÿ› ๏ธ Setup

For the OpenAI API:

pip install openai

For datasets, we provide two version, you can either use the datasets with url image in /data/ChineseSimpleVQA.jsonl or download the datasets.

โš–๏ธ Evals

We provide a simple single evaluation script that we wrote from scratch. The startup command is as follows:

(1): For closedsource LVLMs:

  • Step1: set your openai key in judge/closedsource_eval.py or judge/oepnsource_eval.py:

    os.environ["OPENAI_API_KEY"] = "replace your key here"
    os.environ["OPENAI_BASE_URL"] = "replace your key here"
    
  • Step2: run the eval script:

    (1): For closedsource LVLMs:

    python judge/closedsource_eval.py <model_name>. 
    

    (2): For oepnsource LVLMs:

    python judge/oepnsource_eval.py <model_name>. 
    

    the structure of input data should be transfered like this format with model_output:

    {"ID": "...", 
    "image_url": "...", 
    "recognition_question": "...", 
    "recognition_answer": "...", 
    "final_question": "...", 
    "final_answer": "...", 
    "Topic": "...", 
    "model_output1": "...", 
    "model_output2": "..."
    }
    

Citation

Please cite our paper if you use our dataset.

@article{gu2025see,
  title={" See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models},
  author={Gu, Jihao and Wang, Yingyao and Bu, Pi and Wang, Chen and Wang, Ziming and Song, Tengtao and Wei, Donglai and Yuan, Jiale and Zhao, Yingxiu and He, Yancheng and others},
  journal={arXiv preprint arXiv:2502.11718},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages