VideoChat-Online

[CVPR2025] Online Video Understanding: OVBench and VideoChat-Online

📝 Highlights

🚀 Introducing OVBench OVBench is a benchmark tailored for real-time video understanding:

Memory, Perception, and Prediction of Temporal Contexts: Questions are framed to reference the present state of entities, requiring models to memorize/perceive/predict past/present/future temporal contexts over time.
Dynamic Spatio-temporal Interaction: The benchmark demands precise real-time interactions with video content, where actions, objects, and events must be understood in the context of their spatial and temporal relationships.
Contextual Awareness at Specific Moments: Real-time questions are contextual, changing based on the specific timestamp they are asked, requiring a deep understanding of how temporal context evolves.

🏗️ Pyramid Memory Bank

To tackle the challenges of infinite video streams, we propose a multi-layered Pyramid Memory Bank that balances spatial and temporal information:

Spatial Anchors: The lower layers retain high-resolution features to preserve fine-grained spatial cues, capturing keyframes as "spatial anchors" with a lower sampling rate.
Progressive Abstraction: As the layers progress, spatial resolution decays while the temporal sampling rate grows proportionally, forming an abstract representation of fine-grained long-short-term patterns.
Dynamic Eviction: A dynamic eviction mechanism detects temporal redundancy via similarity, combined with pooling for spatial compression, improving storage efficiency.

🎯 Offline-to-Online Learning Paradigm

A novel training strategy designed for online video streams:

Interleaved Dialogue Tuning: Combines offline video data with online instruction tuning in a dialogue format.
Progressive Learning: Bridges offline and online video understanding, enhancing real-time adaptability.

To-Do

Model checkpoint Upload
A more interactive demo

🏆 OVBench Leaderboard

See our leaderboard here

Evaluate your model

Evaluation of Existing Models on OVBench Using lmms_eval.

Preparatory Steps

Environment Setup: Ensure that all dependencies required by lmms_eval are properly installed.
Please perform a global search for the field /path_to_your in the lmms-eval-ovbench directory and replace it with the corresponding file path on your local system.

Predefined Model Evaluation

Execute the script lmms-eval-ovbench/scripts/eval_models/eval_internvl2-8B.sh to initiate the benchmark evaluation.

Custom Model Evaluation

Given that the video data used in this benchmark consists of both image sequences and video clips, it is necessary to utilize the lmms-eval-ovbench/llava/video_utils.py to read video data correctly.
You may refer to the implementation of the load_video function in lmms-eval-ovbench/lmms_eval/models/internvl2.py as a guideline. Integrate this function into your custom model as needed to enable compatibility with the lmms_eval evaluation framework.

Submit the results

Email xinhaoli00@outlook.com with your result.json or open an issue in this repo.

🎥 Demo

To launch the demo, use the following script:

pingpong.mp4

bash gradio_demo.sh

🛠️ Installation

To install the necessary dependencies, use the following commands:

conda create -n your_env python=3.9
pip install -r requirements.txt

📦 Offline Data Preparation

The anno_data file provides the paths for different types of datasets:

"coin_sl_train": {
    "annotation": "Path to the annotations json file.",
    "data_root": "your data path",
},
...

We support the data reading formats LLaVA and VideoChat2-IT for specific data JSON formats.

🔄 Online SFT Data Download

For the construction format of online data, please refer to VideoChatOnline-IT

📈 Evaluations Results of VideoChatOnline-4B on Long Video Benchmarks

Benchmark	Result
OVBench	54.9
VideoMME	Short: 65.8 Medium: 50.2 Long: 47.1 Avg: 54.4
MVBench	65.2
EgoSchema	54.7
MLVU	60.8
LongVideoBench	54.1

🚀 Training

To run the training, execute the following bash commands for different stages:

#Offline SFT:
bash shell/online_4b/videochat_online_4b_stage1_ft.sh

#Online & Offline Joint SFT:
bash shell/online_4b/videochat_online_4b_stage2_ft.sh

📊 Evaluation on OVBench

#Sliding Window Setting:
bash shell/eval/online_bench_sliding_window.sh
#Streaming Setting:
bash shell/eval/online_bench_stream.sh

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
anno_data		anno_data
assets		assets
deepspeed		deepspeed
eval		eval
internvl		internvl
lmms-eval-ovbench		lmms-eval-ovbench
shell		shell
.gitignore		.gitignore
README.md		README.md
gradio_demo.sh		gradio_demo.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VideoChat-Online

[CVPR2025] Online Video Understanding: OVBench and VideoChat-Online

📝 Highlights

To-Do

🏆 OVBench Leaderboard

Evaluate your model

Preparatory Steps

Predefined Model Evaluation

Custom Model Evaluation

Submit the results

🎥 Demo

🛠️ Installation

📦 Offline Data Preparation

🔄 Online SFT Data Download

📈 Evaluations Results of VideoChatOnline-4B on Long Video Benchmarks

🚀 Training

📊 Evaluation on OVBench

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

MCG-NJU/VideoChat-Online

Folders and files

Latest commit

History

Repository files navigation

VideoChat-Online

[CVPR2025] Online Video Understanding: OVBench and VideoChat-Online

📝 Highlights

To-Do

🏆 OVBench Leaderboard

Evaluate your model

Preparatory Steps

Predefined Model Evaluation

Custom Model Evaluation

Submit the results

🎥 Demo

🛠️ Installation

📦 Offline Data Preparation

🔄 Online SFT Data Download

📈 Evaluations Results of VideoChatOnline-4B on Long Video Benchmarks

🚀 Training

📊 Evaluation on OVBench

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages