Skip to content

insait-institute/Museum-65

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Museum-65

Paper

Dataset from the paper: Understanding the World's Museums through Vision-Language Reasoning

Dataset

Dataset Image MUSEUM-65 is a multi-modal dataset containing 65M images with 200M question-answer pairs in multiple languages, ensuring cultural diversity. The dataset covers 50M objects with questions in English and 15M with questions in other languages (French, Spanish, German, etc). The dataset is available on HuggingFace: Museum-65 The dataset contains:

  • first 52 batches = 1MN dataset used in exeperiments
  • first 473 batches = 10MN dataset used in experiments
  • up to 1721 batches with all images with the english information

license: cc-by-nc-4.0

Benchmark

Tasks We introduce a comprehensive benchmark for MUSEUM-65, that evaluates general and specific tasks across different metrics. This benchmark provides a standardized frame work, allowing for consistent comparison of various methods on this dataset, aiming to guide future research towards effective models and identifying areas for improvement:

  • General VQA, Category-wise VQA, Multiple Angles, Visually Unanswerable Questions, Multiple Languages

Models and finetuning

Flow In our experiments we use two models known for VQA tasks, LLaVA and BLIP, following their finetuning protocols when possible, using our dataset.

For a better understanding for how to finetune these models please access their GitHub pages:

Contact

Project realised with: INSAIT-Institute for Computer Science, Artificial Intelligence and Technology, University St. Kliment Ohridski Sofia, Bulgaria

Please cite accordingly:

@misc{balauca2024understandingworldsmuseumsvisionlanguage,
      title={Understanding the World's Museums through Vision-Language Reasoning}, 
      author={Ada-Astrid Balauca and Sanjana Garai and Stefan Balauca and Rasesh Udayakumar Shetty and Naitik Agrawal and Dhwanil Subhashbhai Shah and Yuqian Fu and Xi Wang and Kristina Toutanova and Danda Pani Paudel and Luc Van Gool},
      year={2024},
      eprint={2412.01370},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.01370}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published