Museum-65

Paper

Dataset from the paper: Understanding the World's Museums through Vision-Language Reasoning

Dataset

MUSEUM-65 is a multi-modal dataset containing 65M images with 200M question-answer pairs in multiple languages, ensuring cultural diversity. The dataset covers 50M objects with questions in English and 15M with questions in other languages (French, Spanish, German, etc). The dataset is available on HuggingFace: Museum-65 The dataset contains:

first 52 batches = 1MN dataset used in exeperiments
first 473 batches = 10MN dataset used in experiments
up to 1721 batches with all images with the english information

license: cc-by-nc-4.0

Benchmark

We introduce a comprehensive benchmark for MUSEUM-65, that evaluates general and specific tasks across different metrics. This benchmark provides a standardized frame work, allowing for consistent comparison of various methods on this dataset, aiming to guide future research towards effective models and identifying areas for improvement:

General VQA, Category-wise VQA, Multiple Angles, Visually Unanswerable Questions, Multiple Languages

Models and finetuning

In our experiments we use two models known for VQA tasks, LLaVA and BLIP, following their finetuning protocols when possible, using our dataset.

For a better understanding for how to finetune these models please access their GitHub pages:

LLaVA
BLIP

Contact

Project realised with: INSAIT-Institute for Computer Science, Artificial Intelligence and Technology, University St. Kliment Ohridski Sofia, Bulgaria

Please cite accordingly:

@misc{balauca2024understandingworldsmuseumsvisionlanguage,
      title={Understanding the World's Museums through Vision-Language Reasoning}, 
      author={Ada-Astrid Balauca and Sanjana Garai and Stefan Balauca and Rasesh Udayakumar Shetty and Naitik Agrawal and Dhwanil Subhashbhai Shah and Yuqian Fu and Xi Wang and Kristina Toutanova and Danda Pani Paudel and Luc Van Gool},
      year={2024},
      eprint={2412.01370},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.01370}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
BLIP		BLIP
Dataset		Dataset
Experiments		Experiments
LLaVA		LLaVA
images_page		images_page
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Museum-65

Paper

Dataset

Benchmark

Models and finetuning

Contact

About

Uh oh!

Releases

Packages

Languages

insait-institute/Museum-65

Folders and files

Latest commit

History

Repository files navigation

Museum-65

Paper

Dataset

Benchmark

Models and finetuning

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages