Skip to content
#

evaluation-llms

Here are 2 public repositories matching this topic...

Language: All
Filter by language

[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes

  • Updated Apr 21, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the evaluation-llms topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-llms topic, visit your repo's landing page and select "manage topics."

Learn more