This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]
benchmark natural-language-processing ai computer-vision perception multimodal-learning multimodal vision-and-language 3d-understanding multimodal-large-language-models perception-evaluation
-
Updated
Jul 3, 2024 - Python