-
2025.05.20
🌟 Our paper has been accepted to ACL 2025! -
2024.09.23
🌟 We provide the code, model, and data for evaluation! -
2024.06.14
🌟 We released AlignMMBench, a comprehensive alignment benchmark for vision language models!
AlignMMBench a multimodal alignment benchmark that encompasses both single-turn and multi-turn dialogue scenarios. It includes three categories and thirteen capability tasks, with a total of 4,978 question-answer pairs.
-
High-Quality Annotations: Reliable benchmark with meticulous human annotation and multi-stage quality control processes.
-
Self Critic: To improve the controllability of alignment evaluation, we introduce the CritiqueVLM, a ChatGLM3-6B based evaluator that has been rule-calibrated and carefully finetuned. With human judgements, its evaluation consistency surpasses that of GPT-4.
-
Diverse Data: Three categories and thirteen capability tasks, including both single-turn and multi-turn dialogue scenarios.
Step 0 Download AlignMMBench data from here, and CritiqueVLM model file from here.
Step 1
Infer your model on AlignMMBench and get your model responses in .jsonl
format like this:
{"question_id": "00000000-0", "predict": "..."}
{"question_id": "00000000-1", "predict": "..."}
{"question_id": "00000000-2", "predict": "..."}
Step 2 Clone this repository and install requirements.
https://github.com/wuyuhang05/AlignMMBench.git && cd AlignMMBench
pip install -r requirements.txt
Step 3 Run CritiqueVLM evaluator in evaluate.py
:
python evaluate.py --critic_model_path <critiqueVLM_path> --response_file <your_model_responses_path> --metadata_file <metadata_path> --save_path <path_to_save_detailed_evaluation_results>
The use of the dataset and the original videos is governed by the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license, as detailed in the LICENSE.
If you believe that any content in this dataset infringes on your rights, please contact us at wuyuhang2022@gmail.com, wenmeng.yu@aminer.cn to request its removal.
If you find our work helpful for your research, please consider citing our work.
@misc{wu2024alignmmbench,
title={AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models},
author={Yuhang Wu and Wenmeng Yu and Yean Cheng and Yan Wang and Xiaohan Zhang and Jiazheng Xu and Ming Ding and Yuxiao Dong},
year={2024},
eprint={2406.09295},
archivePrefix={arXiv}
}