Skip to content

MCG-NJU/VideoChat-Online

Repository files navigation

VideoChat-Online

arXiv Dataset Model Leaderboard License

📝 Highlights

🚀 Introducing OVBench OVBench is a benchmark tailored for real-time video understanding:

  • Memory, Perception, and Prediction of Temporal Contexts: Questions are framed to reference the present state of entities, requiring models to memorize/perceive/predict past/present/future temporal contexts over time.
  • Dynamic Spatio-temporal Interaction: The benchmark demands precise real-time interactions with video content, where actions, objects, and events must be understood in the context of their spatial and temporal relationships.
  • Contextual Awareness at Specific Moments: Real-time questions are contextual, changing based on the specific timestamp they are asked, requiring a deep understanding of how temporal context evolves.

🏗️ Pyramid Memory Bank

To tackle the challenges of infinite video streams, we propose a multi-layered Pyramid Memory Bank that balances spatial and temporal information:

  1. Spatial Anchors: The lower layers retain high-resolution features to preserve fine-grained spatial cues, capturing keyframes as "spatial anchors" with a lower sampling rate.
  2. Progressive Abstraction: As the layers progress, spatial resolution decays while the temporal sampling rate grows proportionally, forming an abstract representation of fine-grained long-short-term patterns.
  3. Dynamic Eviction: A dynamic eviction mechanism detects temporal redundancy via similarity, combined with pooling for spatial compression, improving storage efficiency.

🎯 Offline-to-Online Learning Paradigm

A novel training strategy designed for online video streams:

  • Interleaved Dialogue Tuning: Combines offline video data with online instruction tuning in a dialogue format.
  • Progressive Learning: Bridges offline and online video understanding, enhancing real-time adaptability.
image-20250311180653255
image-20250311184752494
image-202503111847524942

To-Do

  • Model checkpoint Upload
  • A more interactive demo

🏆 OVBench Leaderboard

See our leaderboard here

Evaluate your model

Evaluation of Existing Models on OVBench Using lmms_eval.

Preparatory Steps

  • Environment Setup: Ensure that all dependencies required by lmms_eval are properly installed.

  • Please perform a global search for the field /path_to_your in the lmms-eval-ovbench directory and replace it with the corresponding file path on your local system.

Predefined Model Evaluation

  • Execute the script lmms-eval-ovbench/scripts/eval_models/eval_internvl2-8B.sh to initiate the benchmark evaluation.

Custom Model Evaluation

  • Given that the video data used in this benchmark consists of both image sequences and video clips, it is necessary to utilize the lmms-eval-ovbench/llava/video_utils.py to read video data correctly.

  • You may refer to the implementation of the load_video function in lmms-eval-ovbench/lmms_eval/models/internvl2.py as a guideline. Integrate this function into your custom model as needed to enable compatibility with the lmms_eval evaluation framework.

Submit the results

Email xinhaoli00@outlook.com with your result.json or open an issue in this repo.


🎥 Demo

To launch the demo, use the following script:

pingpong.mp4
bash gradio_demo.sh

🛠️ Installation

To install the necessary dependencies, use the following commands:

conda create -n your_env python=3.9
pip install -r requirements.txt

📦 Offline Data Preparation

The anno_data file provides the paths for different types of datasets:

"coin_sl_train": {
    "annotation": "Path to the annotations json file.",
    "data_root": "your data path",
},
...

We support the data reading formats LLaVA and VideoChat2-IT for specific data JSON formats.


🔄 Online SFT Data Download

For the construction format of online data, please refer to VideoChatOnline-IT

📈 Evaluations Results of VideoChatOnline-4B on Long Video Benchmarks

Benchmark Result
OVBench 54.9
VideoMME Short: 65.8
Medium: 50.2
Long: 47.1
Avg: 54.4
MVBench 65.2
EgoSchema 54.7
MLVU 60.8
LongVideoBench 54.1

🚀 Training

To run the training, execute the following bash commands for different stages:

#Offline SFT:
bash shell/online_4b/videochat_online_4b_stage1_ft.sh
#Online & Offline Joint SFT:
bash shell/online_4b/videochat_online_4b_stage2_ft.sh

📊 Evaluation on OVBench

#Sliding Window Setting:
bash shell/eval/online_bench_sliding_window.sh
#Streaming Setting:
bash shell/eval/online_bench_stream.sh

About

[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •