AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models (ACL' 25)

[🍎 Project Page] [📖 arXiv Paper] [📊 Dataset]

🔥 News

2025.05.20 🌟 Our paper has been accepted to ACL 2025!
2024.09.23 🌟 We provide the code, model, and data for evaluation!
2024.06.14 🌟 We released AlignMMBench, a comprehensive alignment benchmark for vision language models!

👀 Introduce to AlignMMBench

AlignMMBench a multimodal alignment benchmark that encompasses both single-turn and multi-turn dialogue scenarios. It includes three categories and thirteen capability tasks, with a total of 4,978 question-answer pairs.

Features

High-Quality Annotations: Reliable benchmark with meticulous human annotation and multi-stage quality control processes.
Self Critic: To improve the controllability of alignment evaluation, we introduce the CritiqueVLM, a ChatGLM3-6B based evaluator that has been rule-calibrated and carefully finetuned. With human judgements, its evaluation consistency surpasses that of GPT-4.
Diverse Data: Three categories and thirteen capability tasks, including both single-turn and multi-turn dialogue scenarios.

💻 Evaluate your model

Step 0 Download AlignMMBench data from here, and CritiqueVLM model file from here.

Step 1 Infer your model on AlignMMBench and get your model responses in .jsonl format like this:

{"question_id": "00000000-0", "predict": "..."}
{"question_id": "00000000-1", "predict": "..."}
{"question_id": "00000000-2", "predict": "..."}

Step 2 Clone this repository and install requirements.

https://github.com/wuyuhang05/AlignMMBench.git && cd AlignMMBench
pip install -r requirements.txt

Step 3 Run CritiqueVLM evaluator in evaluate.py:

python evaluate.py --critic_model_path <critiqueVLM_path> --response_file <your_model_responses_path> --metadata_file <metadata_path> --save_path <path_to_save_detailed_evaluation_results>

📈 Results

License

The use of the dataset and the original videos is governed by the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license, as detailed in the LICENSE.

If you believe that any content in this dataset infringes on your rights, please contact us at wuyuhang2022@gmail.com, wenmeng.yu@aminer.cn to request its removal.

Citation

If you find our work helpful for your research, please consider citing our work.

@misc{wu2024alignmmbench,
      title={AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models}, 
      author={Yuhang Wu and Wenmeng Yu and Yean Cheng and Yan Wang and Xiaohan Zhang and Jiazheng Xu and Ming Ding and Yuxiao Dong},
      year={2024},
      eprint={2406.09295},
      archivePrefix={arXiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
data		data
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
prompt.py		prompt.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models (ACL' 25)

🔥 News

👀 Introduce to AlignMMBench

Features

💻 Evaluate your model

📈 Results

License

Citation

About

Uh oh!

Releases

Packages

Languages

THUDM/AlignMMBench

Folders and files

Latest commit

History

Repository files navigation

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models (ACL' 25)

🔥 News

👀 Introduce to AlignMMBench

Features

💻 Evaluate your model

📈 Results

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages