HunyuanPortrait

HunyuanPortrait: 基于隐式条件控制的增强肖像动画

📋 项目概述

HunyuanPortrait 是一个基于扩散模型的肖像动画生成框架，能够将单张肖像图片与驱动视频结合，生成自然流畅的动画效果。该技术通过解耦身份特征和运动信息，实现了高质量的肖像动画生成。

🎯 主要特性

高质量动画生成：基于扩散模型，生成逼真自然的肖像动画
身份保持：有效保持原始肖像的身份特征
运动迁移：精确迁移驱动视频中的表情和头部姿势
多种部署方式：支持本地部署、容器化部署和云端服务
易于使用：提供简单的 API 接口和 Web 界面

🚀 快速开始

方式一：容器化部署（推荐）

# 在共绩算力平台上直接部署
# 1. 登录 https://console.suanli.cn/
# 2. 选择 HunyuanPortrait 预制镜像
# 3. 一键部署，无需环境配置

方式二：本地运行

# 克隆项目
git clone https://github.com/Tencent-Hunyuan/HunyuanPortrait
cd HunyuanPortrait

# 安装依赖
pip install -r requirements.txt

# 下载模型
bash download_models.sh

# 运行演示
bash demo.sh

📁 项目结构

HunyuanPortrait/
├── 📂 assets/                    # 资源文件
│   ├── driving_video.mp4        # 驱动视频示例
│   ├── source_image.png         # 源图像示例
│   └── pics/                    # 图片资源
│       ├── logo.png            # 项目 Logo
│       └── pipeline.png        # 架构图
├── 📂 config/                   # 配置文件
│   └── hunyuan-portrait.yaml   # 主配置文件
├── 📂 src/                      # 源代码
│   ├── 📂 dataset/             # 数据处理模块
│   │   ├── test_preprocess.py  # 测试预处理
│   │   └── utils.py           # 工具函数
│   ├── 📂 models/              # 模型定义
│   │   ├── 📂 condition/       # 条件控制模块
│   │   │   ├── attention_processor.py    # 注意力处理器
│   │   │   ├── coarse_motion.py         # 粗粒度运动
│   │   │   ├── pose_guider.py           # 姿势引导器
│   │   │   ├── refine_motion.py         # 运动细化
│   │   │   ├── unet_3d_blocks.py        # 3D UNet 块
│   │   │   └── unet_3d_svd_condition_ip.py # 条件 UNet
│   │   └── 📂 dinov2/         # DINOv2 模型
│   │       ├── 📂 layers/     # 网络层
│   │       └── 📂 models/     # 模型定义
│   ├── 📂 pipelines/           # 推理管道
│   │   └── hunyuan_svd_pipeline.py # 主推理管道
│   └── 📂 schedulers/          # 调度器
│       └── scheduling_euler_discrete.py # Euler 调度器
├── 📂 pretrained_weights/      # 预训练模型（需下载）
│   ├── hyportrait/             # HunyuanPortrait 模型
│   ├── vae/                    # VAE 模型
│   ├── unet/                   # UNet 配置
│   ├── scheduler/              # 调度器配置
│   ├── arcface.onnx           # ArcFace 模型
│   └── yoloface_v5m.pt        # YoloFace 检测器
├── 🐳 Dockerfile               # Docker 镜像定义
├── 📄 requirements.txt         # Python 依赖
├── 🐍 inference.py            # 命令行推理脚本
├── 🌐 gradio_app.py           # Web 界面
├── 🚀 demo.sh                 # 快速演示脚本
└── 📖 README.md               # 项目文档

🔧 核心模块说明

1. 模型架构 (`src/models/`)

条件控制模块 (condition/)：处理运动条件和身份特征
DINOv2 模块 (dinov2/)：图像特征提取和编码
UNet 3D：主要的扩散模型架构

2. 推理管道 (`src/pipelines/`)

HunyuanSVD Pipeline：整合所有组件的端到端推理管道
支持批量处理和实时推理

3. 数据处理 (`src/dataset/`)

预处理模块：图像和视频的预处理
工具函数：辅助数据处理功能

4. 调度器 (`src/schedulers/`)

Euler 离散调度器：控制扩散过程的噪声调度

🎯 模型部署说明

1. 环境要求

硬件要求：

GPU：NVIDIA RTX 3090 或更高（推荐 24GB 显存）
内存：32GB 系统内存
存储：50GB 可用空间（模型文件）

软件要求：

Python 3.8+
CUDA 11.8+
PyTorch 2.0+

2. 模型文件说明

必需模型文件：

pretrained_weights/
├── hyportrait/                 # HunyuanPortrait 核心模型
│   ├── unet.pth               # UNet 权重
│   ├── pose_guider.pth        # 姿势引导器
│   ├── dino.pth               # DINOv2 特征提取器
│   ├── image_proj.pth         # 图像投影器
│   ├── expression.pth         # 表情编码器
│   ├── headpose.pth           # 头部姿势编码器
│   └── motion_proj.pth        # 运动投影器
├── vae/                       # VAE 解码器
│   ├── config.json
│   └── diffusion_pytorch_model.fp16.safetensors
├── unet/                      # UNet 配置
│   └── config.json
├── scheduler/                 # 调度器配置
│   └── scheduler_config.json
├── arcface.onnx              # ArcFace 身份识别
└── yoloface_v5m.pt           # YoloFace 人脸检测

3. 部署方式对比

部署方式	优势	适用场景
共绩算力平台	一键部署，无需配置	快速体验、生产环境
Docker 容器	环境隔离，易于管理	开发测试、CI/CD
本地安装	完全控制，可定制	研究开发、调试

4. 性能优化建议

GPU 优化：

使用 FP16 精度减少显存占用
启用 CUDA 图优化
调整批处理大小

内存优化：

使用梯度检查点
启用模型分片
优化数据加载

网络优化：

使用 CDN 加速模型下载
配置合适的网络带宽
启用 HTTP/2 支持

🧩 社区贡献

如果您在项目中使用或开发了 HunyuanPortrait，欢迎告知我们或提交 PR！💖

📜 系统要求

需要支持 CUDA 的 NVIDIA 3090 GPU
- 模型已在单张 24G GPU 上测试通过
测试操作系统：Linux

🚀 容器化快速部署

1. 在共绩算力平台上部署

运行 HunyuanPortrait 最简单的方式是通过我们在共绩算力上的预构建容器镜像。无需复杂的本地环境配置！

1.1 创建部署服务

登录共绩算力控制台
在首页点击"弹性部署服务"
确保您的账户已开通弹性部署服务权限

1.2 选择 GPU 型号

根据需求选择 GPU 型号：

首次使用或调试阶段推荐：单张 NVIDIA RTX 4090 GPU
生产环境：根据需要配置多张高端 GPU

1.3 选择预制镜像

在"服务配置"模块切换至"预制服务"选项卡
搜索并选择 HunyuanPortrait 官方镜像

1.4 部署并访问服务

点击"部署服务"自动拉取镜像并启动容器
部署完成后，从"快捷访问"中复制端口为 8089 的公网访问链接
使用该地址调用服务

2. API 使用示例

2.1 Python 客户端示例

import os
import requests
import shutil
from gradio_client import Client, handle_file

# API 配置
API_URL = "https://your-deployment-url-8089.550c.cloud/"
IMAGE_URL = 'https://example.com/source_image.png'
VIDEO_URL = 'https://example.com/driving_video.mp4'

INPUT_IMAGE_PATH = "source_image.png"
INPUT_VIDEO_PATH = "driving_video.mp4"
OUTPUT_VIDEO_PATH = "generated_video.mp4"

def download_file(url, local_filename):
    """从 URL 下载文件并保存到本地。"""
    if not os.path.exists(local_filename):
        print(f"正在下载 {url} 到 {local_filename}...")
        try:
            with requests.get(url, stream=True, timeout=30) as r:
                r.raise_for_status()
                with open(local_filename, 'wb') as f:
                    for chunk in r.iter_content(chunk_size=8192):
                        f.write(chunk)
            print("下载完成。")
        except requests.exceptions.RequestException as e:
            print(f"下载文件时出错：{e}")
            return False
    else:
        print(f"{local_filename} 已存在，跳过下载。")
    return True

def main():
    """完整的 HunyuanPortrait 动画 API 使用示例。"""
    print("--- 步骤 1: 准备输入文件 ---")
    if not (download_file(IMAGE_URL, INPUT_IMAGE_PATH) and download_file(VIDEO_URL, INPUT_VIDEO_PATH)):
        print("输入文件下载失败，程序终止。")
        return

    print("\n--- 步骤 2: 调用 API ---")
    print("正在初始化 API 客户端...")
    try:
        # 初始化 Gradio 客户端
        client = Client(API_URL)
        print("客户端初始化完成。")

        print("正在发送请求到 API... 这可能需要一些时间。")
        # 调用 API 的 'predict' 端点
        result = client.predict(
            image=handle_file(INPUT_IMAGE_PATH),
            video_path={"video": handle_file(INPUT_VIDEO_PATH)},
            api_name="/predict"
        )

        print("API 调用成功！")
        print(f"结果已保存到临时路径：{result}")

        print("\n--- 步骤 3: 保存结果 ---")
        # 将生成的视频从临时路径移动到当前工作目录
        if os.path.exists(result):
            print(f"正在将生成视频从 '{result}' 移动到 '{OUTPUT_VIDEO_PATH}'")
            shutil.move(result, OUTPUT_VIDEO_PATH)
            print(f"视频已成功保存到：{os.path.abspath(OUTPUT_VIDEO_PATH)}")
        else:
            print(f"错误：API 返回的路径 '{result}' 无效或文件不存在。")

    except Exception as e:
        print(f"API 调用过程中发生错误：{e}")

if __name__ == "__main__":
    main()

2.2 cURL 示例

# 下载示例文件
wget https://example.com/source_image.png
wget https://example.com/driving_video.mp4

# 通过 cURL 调用 API（替换为实际的 API URL）
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "image": "source_image.png",
    "video_path": {"video": "driving_video.mp4"}
  }' \
  https://your-deployment-url-8089.550c.cloud/predict

3. 可用的容器镜像

我们的容器镜像在共绩算力平台上可用：

hunyuanportrait:latest：包含所有依赖的最新稳定版本
hunyuanportrait:dev：包含最新功能的开发版本
hunyuanportrait:minimal：用于轻量级部署的最小版本

4. 本地容器部署

如果您希望使用 Docker 在本地运行：

# 拉取镜像
docker pull gongjiyun/hunyuanportrait:latest

# 运行容器
docker run -d \
  --name hunyuanportrait \
  --gpus all \
  -p 8089:8089 \
  -v /path/to/models:/app/pretrained_weights \
  gongjiyun/hunyuanportrait:latest

# 访问服务
curl http://localhost:8089/health

🛠️ 本地安装

用于本地开发或自定义修改：

git clone https://github.com/Tencent-Hunyuan/HunyuanPortrait
pip3 install torch torchvision torchaudio
pip3 install -r requirements.txt

📥 下载模型

所有模型默认存储在 pretrained_weights 目录中：

pip3 install "huggingface_hub[cli]"
cd pretrained_weights
huggingface-cli download --resume-download stabilityai/stable-video-diffusion-img2vid-xt --local-dir . --include "*.json"
wget -c https://huggingface.co/LeonJoe13/Sonic/resolve/main/yoloface_v5m.pt
wget -c https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/vae/diffusion_pytorch_model.fp16.safetensors -P vae
wget -c https://huggingface.co/FoivosPar/Arc2Face/resolve/da2f1e9aa3954dad093213acfc9ae75a68da6ffd/arcface.onnx
huggingface-cli download --resume-download tencent/HunyuanPortrait --local-dir hyportrait

文件结构如下：

.
├── arcface.onnx
├── hyportrait
│   ├── dino.pth
│   ├── expression.pth
│   ├── headpose.pth
│   ├── image_proj.pth
│   ├── motion_proj.pth
│   ├── pose_guider.pth
│   └── unet.pth
├── scheduler
│   └── scheduler_config.json
├── unet
│   └── config.json
├── vae
│   ├── config.json
│   └── diffusion_pytorch_model.fp16.safetensors
└── yoloface_v5m.pt

▶️ 本地运行

🔥 执行 bash demo.sh 让您的肖像动起来

video_path="your_video.mp4"
image_path="your_image.png"

python inference.py \
    --config config/hunyuan-portrait.yaml \
    --video_path $video_path \
    --image_path $image_path

或使用 Gradio 服务器：

python gradio_app.py

🏗️ 框架架构

⏳ 简介：

HunyuanPortrait 是一个基于扩散的框架，通过使用预训练编码器解耦身份和运动来生成逼真、时间一致的肖像动画。它将驱动视频的表情/姿势编码为隐式控制信号，通过基于注意力的适配器将它们注入到稳定的扩散骨干中，从而能够从单张参考图像生成详细且风格灵活的动画。该方法在可控性和一致性方面优于现有方法。

🖼 效果展示

使用 HunyuanPortrait 进行肖像动画的一些结果。

更多结果可以在我们的项目页面上找到。

📂 案例展示

f1.mp4

f2.mp4

f4.mp4

f3.mp4

🎤 肖像歌唱

sing_1.mp4

🎬 肖像表演

act_1.mp4

🤪 肖像做鬼脸

emoji_1.mp4

💖 致谢

🔗 引用

如果您认为这个项目有帮助，请随时给我们一个星标⭐️⭐️⭐️并引用我们的论文：

@inproceedings{xu2025hunyuanportrait,
  title={Hunyuanportrait: Implicit condition control for enhanced portrait animation},
  author={Xu, Zunnan and Yu, Zhentao and Zhou, Zixiang and Zhou, Jun and Jin, Xiaoyu and Hong, Fa-Ting and Ji, Xiaozhong and Zhu, Junwei and Cai, Chengfei and Tang, Shiyu and others},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={15909--15919},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
config		config
src		src
LICENSE		LICENSE
LICENSE_MODEL		LICENSE_MODEL
NOTICE		NOTICE
README.md		README.md
demo.sh		demo.sh
gradio_app.py		gradio_app.py
inference.py		inference.py
requirements.txt		requirements.txt
容器化部署 HunyuanPortrait.md		容器化部署 HunyuanPortrait.md

License

slmnb-lab/HunyuanPortrait-GongJi

Folders and files

Latest commit

History

Repository files navigation