Skip to content

EOC-Bench, an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocentric scenarios.

Notifications You must be signed in to change notification settings

alibaba-damo-academy/EOCBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EOC-Bench : Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

arXiv preprint Dataset Project Page Learderboard Blog

🔍 Overview

We introduce EOC-Bench, an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocentric scenarios. Specially, EOC-Bench features 3,277 meticulously annotated QA pairs categorized into three temporal categories: Past, Present, and Future, covering 11 fine-grained evaluation dimensions and 3 visual object referencing types. To ensure thorough assessment, we develop a mixed-format human-in-the-loop annotation framework with four types of questions and design a novel multi-scale temporal accuracy metric for open-ended temporal evaluation.

📚 Tasks Definition

EOC-Bench structures questions into three temporally grounded categories: Past, Present, and Future, with a total of 11 categories.

🌟 Run Your Own Evaluation

🤗 Benchmark

Our benchmark is hosted on HuggingFace.

For all videos, we add the frame with box prompt at the end of the original video.

🛠️ Installation

Set up your environment with ease:

conda create --name eocbench python=3.10
conda activate eocbench

git clone git@github.com:alibaba-damo-academy/EOCBench.git
cd EOCBench

pip install -r requirements.txt

📈 Evaluation

Our codebase supports a variety of models for evaluation, including gpt-4o, gemini, qwen2.5-vl, internvl2_5, llava_video, llava_onevision, video_llava, longva, videollama2, videollama3, nvila, videorefer, vip-llava, osprey, and sphinx-v. Adjust the model settings in eval.sh, then run the script to begin your evaluation.

To test your own models, you can add a class in the models directory with the generate_outputs function.

For further instructions and detailed guidance, please refer to our evaluation documentation.

🏆 Leaderboard

The Leaderboard for EOC-Bench is continuously being updated, welcoming the contribution of your LVLMs!

To submit your results to the leaderboard, please send to circleradon@gmail.com with your result json files.

📑 Citation

@article{yuan2025eocbench,
      author    = {Yuqian Yuan, Ronghao Dang, Long Li, Wentong Li, Dian Jiao, Xin Li, Deli Zhao, Fan Wang, Wenqiao Zhang, Jun Xiao, Yueting Zhuang},
      title     = {EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?},
      journal   = {arXiv},
      year      = {2025},
      url       = {https://arxiv.org/abs/2506.05287}
    }

About

EOC-Bench, an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocentric scenarios.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published