Skip to content

[CAD/Graphics 2025][Computers & Graphics] Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model

License

Notifications You must be signed in to change notification settings

MingtaoGuo/Face-Reenactment-Video-Diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model

Mingtao Guo1  Guanyu Xing2  Yanci Zhang3  Yanli Liu1,3 

1 National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, China

2 School of Cyber Science and Engineering, Sichuan University, Chengdu, China

3 College of Computer Science, Sichuan University, Chengdu, China

Accepted to CAD/Graphics 2025 and Recommended to Computers & Graphics Journal

Instructions for GRSI Replicability Submission

To replicate the main results (as shown in the Fig. 2), please follow the steps below:

You may modify the source image and driving video paths in inference.py to test with your own inputs.

resources/source1.png--resources/driving1.mp4
resources/source2.png--resources/driving2.mp4
resources/source3.png--resources/driving3.mp4
resources/source4.png--resources/driving4.mp4
resources/source5.png--resources/driving5.mp4

Hardware Requirements

  • GPU: NVIDIA RTX 4090 or equivalent
  • VRAM: At least 12 GB recommended
  • Inference Time: Approximately 4 minutes per 100-frame video on an RTX 4090

📑 Todos

We are going to make all the following contents available:

  • Model inference code
  • Model checkpoint
  • Training code

Installation

  1. Clone this repo locally:
git clone https://github.com/MingtaoGuo/Face-Reenactment-Video-Diffusion
cd Face-Reenactment-Video-Diffusion
  1. Install the dependencies:
conda create -n frvd python=3.8
conda activate frvd
  1. Install packages for inference:
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121

pip install -r requirements.txt

Download weights

mkdir pretrained_weights
mkdir pretrained_weights/checkpoint-30000-14frames
mkdir pretrained_weights/facecropper
mkdir pretrained_weights/liveportrait
git-lfs install

git clone https://huggingface.co/MartinGuo/Face-Reenactment-Video-Diffusion
mv Face-Reenactment-Video-Diffusion/head_embedder.pth pretrained_weights/checkpoint-30000-14frames
mv Face-Reenactment-Video-Diffusion/warping_feature_mapper.pth pretrained_weights/checkpoint-30000-14frames

mv Face-Reenactment-Video-Diffusion/insightface pretrained_weights/facecropper
mv Face-Reenactment-Video-Diffusion/landmark.onnx pretrained_weights/facecropper

mv Face-Reenactment-Video-Diffusion/appearance_feature_extractor.pth pretrained_weights/liveportrait
mv Face-Reenactment-Video-Diffusion/motion_extractor.pth pretrained_weights/liveportrait
mv Face-Reenactment-Video-Diffusion/spade_generator.pth pretrained_weights/liveportrait
mv Face-Reenactment-Video-Diffusion/warping_module.pth pretrained_weights/liveportrait

git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt
mv stable-video-diffusion-img2vid-xt pretrained_weights

git clone https://huggingface.co/stabilityai/sd-vae-ft-mse
mv sd-vae-ft-mse pretrained_weights/stable-video-diffusion-img2vid-xt

The weights will be saved in the ./pretrained_weights directory. Please note that the download process may take a significant amount of time. Once completed, the weights should be arranged in the following structure:

./pretrained_weights/
|-- checkpoint-30000-14frames
|   |-- warping_feature_mapper.pth
|   |-- head_embedder.pth
|-- facecropper
|   |-- insightface
|   |-- landmark.onnx
|-- liveportrait
|   |-- appearance_feature_extractor.pth
|   |-- motion_extractor.pth
|   |-- spade_generator.pth
|   |-- warping_module.pth
|-- stable-video-diffusion-img2vid-xt
    |-- sd-vae-ft-mse
    |   |-- config.json
    |   |-- diffusion_pytorch_model.bin
    |-- feature_extractor
    |   |-- preprocessor_config.json
    |-- scheduler
    |   |-- scheduler_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   |-- diffusion_pytorch_model.safetensors
    |   |-- diffusion_pytorch_model.fp16.safetensors
    |-- image_encoder
    |   |-- config.json
    |   |-- model.safetensors
    |   |-- model.fp16.safetensors

🚀 Training and Inference

Inference of the FRVD

python inference.py

After running inference.py you'll get the results:

  1. Source image, 2. Driving video, 3. Reenactment result

Training of the FRVD

python train.py 

Acknowledgements

We first thank to the contributors to the StableVideoDiffusion, SVD_Xtend and MimicMotion repositories, for their open research and exploration. Furthermore, our repo incorporates some codes from LivePortrait and InsightFace, and we extend our thanks to them as well.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

[CAD/Graphics 2025][Computers & Graphics] Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published