🚀 Training and Inference

Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model

Mingtao Guo¹ Guanyu Xing² Yanci Zhang³ Yanli Liu^1,3

¹ National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, China

² School of Cyber Science and Engineering, Sichuan University, Chengdu, China

³ College of Computer Science, Sichuan University, Chengdu, China

Accepted to CAD/Graphics 2025 and Recommended to Computers & Graphics Journal

Instructions for GRSI Replicability Submission

To replicate the main results (as shown in the Fig. 2), please follow the steps below:

You may modify the source image and driving video paths in inference.py to test with your own inputs.

resources/source1.png--resources/driving1.mp4
resources/source2.png--resources/driving2.mp4
resources/source3.png--resources/driving3.mp4
resources/source4.png--resources/driving4.mp4
resources/source5.png--resources/driving5.mp4

Hardware Requirements

GPU: NVIDIA RTX 4090 or equivalent
VRAM: At least 12 GB recommended
Inference Time: Approximately 4 minutes per 100-frame video on an RTX 4090

📑 Todos

We are going to make all the following contents available:

Model inference code
Model checkpoint
Training code

Installation

Clone this repo locally:

git clone https://github.com/MingtaoGuo/Face-Reenactment-Video-Diffusion
cd Face-Reenactment-Video-Diffusion

Install the dependencies:

conda create -n frvd python=3.8
conda activate frvd

Install packages for inference:

pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121

pip install -r requirements.txt

Download weights

mkdir pretrained_weights
mkdir pretrained_weights/checkpoint-30000-14frames
mkdir pretrained_weights/facecropper
mkdir pretrained_weights/liveportrait
git-lfs install

git clone https://huggingface.co/MartinGuo/Face-Reenactment-Video-Diffusion
mv Face-Reenactment-Video-Diffusion/head_embedder.pth pretrained_weights/checkpoint-30000-14frames
mv Face-Reenactment-Video-Diffusion/warping_feature_mapper.pth pretrained_weights/checkpoint-30000-14frames

mv Face-Reenactment-Video-Diffusion/insightface pretrained_weights/facecropper
mv Face-Reenactment-Video-Diffusion/landmark.onnx pretrained_weights/facecropper

mv Face-Reenactment-Video-Diffusion/appearance_feature_extractor.pth pretrained_weights/liveportrait
mv Face-Reenactment-Video-Diffusion/motion_extractor.pth pretrained_weights/liveportrait
mv Face-Reenactment-Video-Diffusion/spade_generator.pth pretrained_weights/liveportrait
mv Face-Reenactment-Video-Diffusion/warping_module.pth pretrained_weights/liveportrait

git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt
mv stable-video-diffusion-img2vid-xt pretrained_weights

git clone https://huggingface.co/stabilityai/sd-vae-ft-mse
mv sd-vae-ft-mse pretrained_weights/stable-video-diffusion-img2vid-xt

The weights will be saved in the ./pretrained_weights directory. Please note that the download process may take a significant amount of time. Once completed, the weights should be arranged in the following structure:

./pretrained_weights/
|-- checkpoint-30000-14frames
|   |-- warping_feature_mapper.pth
|   |-- head_embedder.pth
|-- facecropper
|   |-- insightface
|   |-- landmark.onnx
|-- liveportrait
|   |-- appearance_feature_extractor.pth
|   |-- motion_extractor.pth
|   |-- spade_generator.pth
|   |-- warping_module.pth
|-- stable-video-diffusion-img2vid-xt
    |-- sd-vae-ft-mse
    |   |-- config.json
    |   |-- diffusion_pytorch_model.bin
    |-- feature_extractor
    |   |-- preprocessor_config.json
    |-- scheduler
    |   |-- scheduler_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   |-- diffusion_pytorch_model.safetensors
    |   |-- diffusion_pytorch_model.fp16.safetensors
    |-- image_encoder
    |   |-- config.json
    |   |-- model.safetensors
    |   |-- model.fp16.safetensors

🚀 Training and Inference

Inference of the FRVD

python inference.py

After running inference.py you'll get the results:

Source image, 2. Driving video, 3. Reenactment result

Training of the FRVD

python train.py

Acknowledgements

We first thank to the contributors to the StableVideoDiffusion, SVD_Xtend and MimicMotion repositories, for their open research and exploration. Furthermore, our repo incorporates some codes from LivePortrait and InsightFace, and we extend our thanks to them as well.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assets		assets
resources		resources
src		src
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model

Accepted to CAD/Graphics 2025 and Recommended to Computers & Graphics Journal

Instructions for GRSI Replicability Submission

📑 Todos

Installation

Download weights

🚀 Training and Inference

Inference of the FRVD

Training of the FRVD

Acknowledgements

License

About

Uh oh!

Releases

Packages

Languages

License

MingtaoGuo/Face-Reenactment-Video-Diffusion

Folders and files

Latest commit

History

Repository files navigation

Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model

Accepted to CAD/Graphics 2025 and Recommended to Computers & Graphics Journal

Instructions for GRSI Replicability Submission

📑 Todos

Installation

Download weights

🚀 Training and Inference

Inference of the FRVD

Training of the FRVD

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages