EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

简体中文 | English

EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Rang Meng¹ Yan Wang Weipeng Wu Ruobing Zheng Yuming Li² Chenguang Ma²

Terminal Technology Department, Alipay, Ant Group.

¹Core Contributor ²Corresponding Authors

📣 Updates

[2025.08.12] 🔥🚀 12G VRAM is All YOU NEED to Generate Video. Please use this GradioUI. Check the tutorial from @gluttony-10. Thanks for the contribution.
[2025.08.12] 🔥 EchoMimicV3 can run on 16G VRAM using ComfyUI. Thanks @smthemex for the contribution.
[2025.08.09] 🔥 We release our models on ModelScope.
[2025.08.08] 🔥 We release our codes on GitHub and models on Huggingface.
[2025.07.08] 🔥 Our paper is in public on arxiv.

🌅 Gallery

teaser_github.mp4

hoi_github.mp4

Chinese Driven Audio

01.mp4

02.mp4

03.mp4

04.mp4

For more demo videos, please refer to the project page

Quick Start

Environment Setup

Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 12.1
Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G)
Tested Python Version: 3.10 / 3.11

🛠️Installation for Windows

Please use the one-click installation package to get started quickly for Quantified version.

🛠️Installation for Linux

1. Create a conda environment

conda create -n echomimic_v3 python=3.10
conda activate echomimic_v3

2. Other dependencies

pip install -r requirements.txt

🧱Model Preparation

Models	Download Link	Notes
Wan2.1-Fun-V1.1-1.3B-InP	🤗 Huggingface	Base model
wav2vec2-base	🤗 Huggingface	Audio encoder
EchoMimicV3-preview	🤗 Huggingface	Our weights
EchoMimicV3-preview	🤗 ModelScope	Our weights

-- The weights is organized as follows.

./models/
├── Wan2.1-Fun-V1.1-1.3B-InP
├── wav2vec2-base-960h
└── transformer
    └── diffusion_pytorch_model.safetensors

🔑 Quick Inference

python infer.py

For Quantified GradioUI version:

python app_mm.py

images, audios, masks and prompts are provided in datasets/echomimicv3_demos

Tips

Audio CFG: Audio CFG audio_guidance_scale works optimally between 2~3. Increase the audio CFG value for better lip synchronization, while decreasing the audio CFG value can improve the visual quality.

Text CFG: Text CFG guidance_scale works optimally between 3~6. Increase the text CFG value for better prompt following, while decreasing the text CFG value can improve the visual quality.

TeaCache: The optimal range for teacache_threshold is between 0~0.1.

Sampling steps: 5 steps for talking head, 15~25 steps for talking body.

Long video generation: If you want to generate a video longer than 138 frames, you can use Long Video CFG.

Try setting partial_video_length to 81, 65 or smaller to reduce VRAM usage.

📝 TODO List

Status	Milestone
✅	The inference code of EchoMimicV3 meet everyone on GitHub
✅	EchoMimicV3-preview model on HuggingFace
✅	EchoMimicV3-preview model on ModelScope
🚀	ModelScope Space
🚀	Preview version Pretrained models trained on English and Chinese on ModelScope
🚀	720P Pretrained models trained on English and Chinese on HuggingFace
🚀	720P Pretrained models trained on English and Chinese on ModelScope
🚀	The training code of EchoMimicV3 meet everyone on GitHub

🚀 EchoMimic Series

EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation. GitHub
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. GitHub
EchoMimicV1: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning. GitHub

📒 Citation

If you find our work useful for your research, please consider citing the paper :

@misc{meng2025echomimicv3,
  title={EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation},
  author={Rang Meng, Yan Wang, Weipeng Wu, Ruobing Zheng, Yuming Li, Chenguang Ma},
  year={2025},
  eprint={2507.03905},
  archivePrefix={arXiv}
}

Reference

Wan2.1: https://github.com/Wan-Video/Wan2.1/
VideoX-Fun: https://github.com/aigc-apps/VideoX-Fun/

📜 License

The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations.

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
asset		asset
config		config
datasets/echomimicv3_demos		datasets/echomimicv3_demos
src		src
.gitignore		.gitignore
LEGAL.md		LEGAL.md
LICENSE.txt		LICENSE.txt
README.md		README.md
README_zh.md		README_zh.md
app.py		app.py
app_mm.py		app_mm.py
infer.py		infer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

📣 Updates

🌅 Gallery

Chinese Driven Audio

Quick Start

Environment Setup

🛠️Installation for Windows

Please use the one-click installation package to get started quickly for Quantified version.

🛠️Installation for Linux

1. Create a conda environment

2. Other dependencies

🧱Model Preparation

🔑 Quick Inference

Tips

📝 TODO List

🚀 EchoMimic Series

📒 Citation

Reference

📜 License

🌟 Star History

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

antgroup/echomimic_v3

Folders and files

Latest commit

History

Repository files navigation

EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

📣 Updates

🌅 Gallery

Chinese Driven Audio

Quick Start

Environment Setup

🛠️Installation for Windows

Please use the one-click installation package to get started quickly for Quantified version.

🛠️Installation for Linux

1. Create a conda environment

2. Other dependencies

🧱Model Preparation

🔑 Quick Inference

Tips

📝 TODO List

🚀 EchoMimic Series

📒 Citation

Reference

📜 License

🌟 Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages