Skip to content

IGLICT/Sketch3DVE

Repository files navigation

Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing

   

SIGGRAPH 2025

🚀 Introduction

We propose Sketch3DVE, a sketch-based 3D-aware video editing method to enable detailed local manipulation of videos with significant viewpoint changes. Please check our project page and paper for more information.

1. 3D-Aware Scene Video Editing

Input Video Edited Image Generated video

📝 Changelog

  • [2025.08.03]: 🔥🔥 Release code and model weights.
  • [2025.08.05]: Launch the project page and update the arXiv preprint.

🧰 Models

Model Resolution GPU Mem. & Inference Time (A100, ddim 50steps) Checkpoint
Sketch3DVE 720x480 ~27G & 53s Hugging Face

Our method is built based on pretrained CogVideoX-2b model. We add an additional sketch conditional network for editing.

Currently, our Sketch3DVE can support generating videos of up to 49 frames with a resolution of 720x480. For editing, we assume the input video has 49 frames with a resolution of 720x480.

The inference time can be reduced by using fewer DDIM steps.

⚙️ Setup

We test the code on CUDA 11.8 and Python 3.10, so we recommend using the same environment.

Install Environment via Anaconda (Recommended)

conda create -n sketch3dve python=3.10
conda activate sketch3dve
pip install -r requirements.txt
conda install https://anaconda.org/pytorch3d/pytorch3d/0.7.8/download/linux-64/pytorch3d-0.7.8-py310_cu118_pyt240.tar.bz2

Notably, diffusers==0.30.1 is required.

💫 Inference

1. 3D-Aware Scene Video Editing

Download pretrained Dust3R model [Download Link] and DepthAnythingV2 model [hugging face] and LLaVA model [hugging face] and pretrained CogVideoX-2b [hugging face] video generation model. Then, modify the --dust3r_model_path and --depthanything_model_path and --Llava_model_path and --basemodel_ckpt_path and --controlnet_ckpt_path(see the download link above) in examples/xxx/test.sh to corresponding paths.

Edit example videos.

cd examples/beach
sh test.sh

😉 Citation

Please consider citing our paper if our code is useful:

@inproceedings{
author = {Liu, Feng-Lin and Li, Shi-Yang and Cao, Yan-Pei and Fu, Hongbo and Gao, Lin},
title = {Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing},
year = {2025},
booktitle = {Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
articleno = {152},
numpages = {12},
keywords = {Sketch-based interaction, video generation, video editing, video diffusion models},
series = {SIGGRAPH Conference Papers '25}
}

🙏 Acknowledgements

We thanks the projects of video generation models CogVideoX and ControlNet and Dust3R and DepthAnythingV2. Our code introduction is modified from ViewCrafter template.

📢 Disclaimer

Our framework achieves interesting sketch-based 3D-Aware video editing, but due to the variaity of generative video prior, the success rate is not guaranteed. Different random seeds can be tried to generate the best video generation results.

This project strives to impact the domain of AI-driven video generation positively. Users are granted the freedom to create videos using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages