Augmented Deep Contexts for Spatially Embedded Video Coding [CVPR 2025]

Yifan Bian, Chuanbo Tang, Li Li, Dong Liu

[Arxiv] [BibTeX] [Dataset]

📌Overview

Our Spatially Embedded Video Codec (SEVC) significantly advances the performance of Neural Video Codecs (NVCs). Furthermore, SEVC possess enhanced robustness for special video sequences while offering additional functionality.

Large Motions: SEVC can better handle sequences with large motions through a progressive motion augmentation.
Emerging Objects: Equipped with spatial references, SEVC can better handle sequences with emerging objects in low-delay scenes.
Fast Decoding: SEVC provides a fast decoding mode to reconstruct a low-resolution video.

📢 News

[2025/04/05]: Our paper is selected as a highlight paper [13.5%].

📊 Experimental Results

Main Results

Results comparison (BD-Rate and RD curve) for PSNR. The Intra Period is –1 with 96 frames. The anchor is VTM-13.2 LDB

	HEVC_B	MCL-JCV	UVG	USTC-TD
DCVC-HEM	10.0	4.9	1.2	27.2
DCVC-DC	-10.8	-13.0	-21.2	11.9
DCVC-FM	-11.7	-12.5	-24.3	23.9
SEVC (ours)	-17.5	-27.7	-33.2	-12.5

Visualizations

Our SEVC can get better reconstructed MVs on the decoder side in large motion sequences. Here, we choose RAFT as the pseudo motion label.

Spatial references augment the context for frame coding. For those emerging objects, which do not appear in previous frames, SEVC gives a better description in deep contexts.

Installation

This implementation of SEVC is based on DCVC-DC and CompressAI. Please refer to them for more information.

1. Install the dependencies

conda create -n $YOUR_PY38_ENV_NAME python=3.8
conda activate $YOUR_PY38_ENV_NAME

conda install pytorch==1.10.0 torchvision==0.11.0 cudatoolkit=11.3 -c pytorch
pip install pytorch_ssim scipy matplotlib tqdm bd-metric pillow pybind11

2. Prepare test datasets

For testing the RGB sequences, we use FFmpeg to convert the original YUV 420 data to RGB data.

A recommended structure of the test dataset is like:

test_datasets/
    ├── HEVC_B/
    │   ├── BQTerrace_1920x1080_60/
    │   │   ├── im00001.png
    │   │   ├── im00002.png
    │   │   ├── im00003.png
    │   │   └── ...
    │   ├── BasketballDrive_1920x1080_50/
    │   │   ├── im00001.png
    │   │   ├── im00002.png
    │   │   ├── im00003.png
    │   │   └── ...
    │   └── ...
    ├── HEVC_C/
    │   └── ... (like HEVC_B)
    └── HEVC_D/
        └── ... (like HEVC_C)

3. Compile the arithmetic coder

If you need real bitstream writing, please compile the arithmetic coder using the following commands.

On Windows

cd src
mkdir build
cd build
conda activate $YOUR_PY38_ENV_NAME
cmake ../cpp -G "Visual Studio 16 2019" -A x64
cmake --build . --config Release

On Linux

sudo apt-get install cmake g++
cd src
mkdir build
cd build
conda activate $YOUR_PY38_ENV_NAME
cmake ../cpp -DCMAKE_BUILD_TYPE=Release
make -j

🚀 Usage

1. Evaluation

Run the following command to evaluate the model and generate a JSON file that contains test results.

python test.py --rate_num 4 --test_config ./config_F96-IP-1.json --cuda 1 --worker 1 --output_path output.json --i_frame_model_path ./ckpt/cvpr2023_i_frame.pth.tar --p_frame_model_path ./ckpt/cvpr2025_p_frame.pth.tar

We use the same Intra model as DCVC-DC. cvpr2023_i_frame.pth.tar can be downloaded from DCVC-DC.
Our cvpr2025_p_frame.pth.tar can be downloaded from CVPR2025-SEVC. cvpr2023_i_frame.pth.tar is also available here.

Put the model weights into the ./ckpt directory and run the above command.

Our model supports variable bitrate. Set different i_frame_q_indexes and p_frame_q_indexes to evaluate different bitrates.

2. Real Encoding/Decoding

If you want real encoding/decoding, please use the encoder/decoder script as follows:

Encoding

python encoder.py -i $video_path -q $q_index --height $video_height --width $video_width --frames $frame_to_encode --ip -1 --fast $fast_mode -b $bin_path --i_frame_model_path ./ckpt/cvpr2023_i_frame.pth.tar --p_frame_model_path ./ckpt/cvpr2025_p_frame.pth.tar

$video_path: input video path | For PNG files, it should be a directory.
$q_index: 0-63 | Less value indicates lower quality.
$frames: N frames | Frames to be encoded. Default is set to -1 (all frames).
$fast: 0/1 | 1 indicates openning fast encoding mode. If --fast 1 is used, only a 4x downsampled video will be encoded.

Decoding

python decoder.py -b $bin_path -o $rec_path --i_frame_model_path ./ckpt/cvpr2023_i_frame.pth.tar --p_frame_model_path ./ckpt/cvpr2025_p_frame.pth.tar

If it is a fast mode, you will only get a 4x downsampled video.
If it is not a fast mode, you will get two videos: 4x downsampled and full resolution.

3. Temporal Stability

To intuitively verify the temporal stability of the two resolution videos, we provide two reconstruction examples with four bitrates:

BasketballDrive_1920x1080_50: q1, q2, q3, q4
RaceHorses_832x480_30: q1, q2, q3, q4

You can find them in examples.

They are stored in rgb24 format. You can use the YUV Player to display them and observe the temporal stability.

Note that: if you are displaying the skim mode rec, do not forget to set the right resolution, which is a quarter of full resolution.

📖 Citation

If this repo helped you, a ⭐ star or citation would make my day!

@InProceedings{Bian_2025_CVPR,
    author    = {Bian, Yifan and Tang, Chuanbo and Li, Li and Liu, Dong},
    title     = {Augmented Deep Contexts for Spatially Embedded Video Coding},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {2094-2104}
}

📧 Contact

If you have any questions, please contact me:

togelbian@gmail.com (main)
esakak@mail.ustc.edu.cn (alternative)

License

This work is licensed under MIT license.

Acknowledgement

Our work is implemented based on DCVC-DC and CompressAI.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config_F96-IP-1.json		config_F96-IP-1.json
decoder.py		decoder.py
encoder.py		encoder.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Augmented Deep Contexts for Spatially Embedded Video Coding [CVPR 2025]

📌Overview

📢 News

📊 Experimental Results

Main Results

Visualizations

Installation

🚀 Usage

📖 Citation

📧 Contact

License

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

EsakaK/SEVC

Folders and files

Latest commit

History

Repository files navigation

Augmented Deep Contexts for Spatially Embedded Video Coding [CVPR 2025]

📌Overview

📢 News

📊 Experimental Results

Main Results

Visualizations

Installation

🚀 Usage

📖 Citation

📧 Contact

License

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages