EOC-Bench : Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

🔍 Overview

We introduce EOC-Bench, an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocentric scenarios. Specially, EOC-Bench features 3,277 meticulously annotated QA pairs categorized into three temporal categories: Past, Present, and Future, covering 11 fine-grained evaluation dimensions and 3 visual object referencing types. To ensure thorough assessment, we develop a mixed-format human-in-the-loop annotation framework with four types of questions and design a novel multi-scale temporal accuracy metric for open-ended temporal evaluation.

📚 Tasks Definition

EOC-Bench structures questions into three temporally grounded categories: Past, Present, and Future, with a total of 11 categories.

🌟 Run Your Own Evaluation

🤗 Benchmark

Our benchmark is hosted on HuggingFace.

For all videos, we add the frame with box prompt at the end of the original video.

🛠️ Installation

Set up your environment with ease:

conda create --name eocbench python=3.10
conda activate eocbench

git clone git@github.com:alibaba-damo-academy/EOCBench.git
cd EOCBench

pip install -r requirements.txt

📈 Evaluation

Our codebase supports a variety of models for evaluation, including gpt-4o, gemini, qwen2.5-vl, internvl2_5, llava_video, llava_onevision, video_llava, longva, videollama2, videollama3, nvila, videorefer, vip-llava, osprey, and sphinx-v. Adjust the model settings in eval.sh, then run the script to begin your evaluation.

To test your own models, you can add a class in the models directory with the generate_outputs function.

For further instructions and detailed guidance, please refer to our evaluation documentation.

🏆 Leaderboard

The Leaderboard for EOC-Bench is continuously being updated, welcoming the contribution of your LVLMs!

To submit your results to the leaderboard, please send to circleradon@gmail.com with your result json files.

📑 Citation

@article{yuan2025eocbench,
      author    = {Yuqian Yuan, Ronghao Dang, Long Li, Wentong Li, Dian Jiao, Xin Li, Deli Zhao, Fan Wang, Wenqiao Zhang, Jun Xiao, Yueting Zhuang},
      title     = {EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?},
      journal   = {arXiv},
      year      = {2025},
      url       = {https://arxiv.org/abs/2506.05287}
    }

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
models		models
.gitignore		.gitignore
LLMs.py		LLMs.py
README.md		README.md
eval.md		eval.md
eval.py		eval.py
eval.sh		eval.sh
eval_large.sh		eval_large.sh
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EOC-Bench : Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

🔍 Overview

📚 Tasks Definition

🌟 Run Your Own Evaluation

🤗 Benchmark

🛠️ Installation

📈 Evaluation

🏆 Leaderboard

📑 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

alibaba-damo-academy/EOCBench

Folders and files

Latest commit

History

Repository files navigation

EOC-Bench : Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

🔍 Overview

📚 Tasks Definition

🌟 Run Your Own Evaluation

🤗 Benchmark

🛠️ Installation

📈 Evaluation

🏆 Leaderboard

📑 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages