The official repository of the AAAI2023 paper StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles
Paper | Supp. Materials | Video
The proposed StyleTalk can generate talking head videos with speaking styles specified by arbitrary style reference videos.
- April 14th, 2023. The code is available.
Clone this repo, install conda and run:
conda create -n styletalk python=3.7.0
conda activate styletalk
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda update ffmpegThe code has been test on CUDA 11.1, GPU RTX 3090.
Our methods takes 3DMM parameters(*.mat) and phoneme labels(*_seq.json) as input. Follow PIRenderer to extract 3DMM parameters. Follow AVCT to extract phoneme labels. Some preprocessed data can be found in folder samples.
Download checkpoints for StyleTalk  and Renderer and put them into ./checkpoints.
Run the demo:
python inference_for_demo.py \
--audio_path samples/source_video/phoneme/reagan_clip1_seq.json \
--style_clip_path samples/style_clips/3DMM/happyenglish_clip1.mat \
--pose_path samples/source_video/3DMM/reagan_clip1.mat \
--src_img_path samples/source_video/image/andrew_clip_1.png \
--wav_path samples/source_video/wav/reagan_clip1.wav \
--output_path demo.mp4Change audio_path, style_clip_path, pose_path, src_img_path, wav_path, output_path to generate more results.
Some code are borrowed from following projects:
Thanks for their contributions!
