增加了一些只通过文本和角色就会调用的api,api_role_v3.py支持v3 #2164
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
功能:
/
),支持默认参考音频和参数调整。/ttsrole
接口,支持基于角色的 TTS 推理,动态加载角色模型和参考音频,同时支持 GET 和 POST 请求。运行服务:
python api_role_v3.py -s "path/to/sovits.pth" -g "path/to/gpt.ckpt" -dr "ref.wav" -dt "参考文本" -dl "zh" -p 9880
参数说明:
命令行参数:
接口参数(/):
接口参数(/ttsrole):
完整请求示例 (/ttsrole POST)
{
"text": "你好", # str, 必填, 要合成的文本内容
"role": "role1", # str, 必填, 角色名称,决定使用 roles/{role} 中的配置和音频
"emotion": "开心", # str, 可选, 情感标签,用于从 roles/{role}/reference_audios 中选择音频
"text_lang": "auto", # str, 可选, 默认 "auto", 文本语言,"auto" 时根据 emotion 或角色目录动态选择
"ref_audio_path": "/path/to/ref.wav", # str, 可选, 参考音频路径,若提供则优先使用,跳过自动选择
"aux_ref_audio_paths": ["/path1.wav", "/path2.wav"], # List[str], 可选, 辅助参考音频路径,用于多说话人融合
"prompt_lang": "ja", # str, 可选, 提示文本语言,若提供 ref_audio_path 则需指定,"auto" 模式下动态选择
"prompt_text": "こんにちは", # str, 可选, 提示文本,与 ref_audio_path 配对使用,自动选择时从文件或文件名生成
"top_k": 10, # int, 可选, Top-K 采样值,覆盖 inference.top_k
"top_p": 0.8, # float, 可选, Top-P 采样值,覆盖 inference.top_p
"temperature": 1.0, # float, 可选, 温度值,覆盖 inference.temperature
"text_split_method": "cut5", # str, 可选, 文本分割方法,覆盖 inference.text_split_method, 具体见text_segmentation_method.py
"batch_size": 2, # int, 可选, 批处理大小,覆盖 inference.batch_size
"batch_threshold": 0.75, # float, 可选, 批处理阈值,覆盖 inference.batch_threshold
"split_bucket": true, # bool, 可选, 是否按桶分割,覆盖 inference.split_bucket
"speed_factor": 1.2, # float, 可选, 语速因子,覆盖 inference.speed_factor
"fragment_interval": 0.3, # float, 可选, 片段间隔(秒),覆盖 inference.fragment_interval
"seed": 42, # int, 可选, 随机种子,覆盖 seed
"media_type": "wav", # str, 可选, 默认 "wav", 输出格式,支持 "wav", "raw", "ogg", "aac"
"streaming_mode": false, # bool, 可选, 默认 false, 是否流式返回
"parallel_infer": true, # bool, 可选, 默认 true, 是否并行推理
"repetition_penalty": 1.35, # float, 可选, 重复惩罚值,覆盖 inference.repetition_penalty
"version": "v2", # str, 可选, 配置文件版本,覆盖 version,动态切换 v2 或 v3
"languages": ["zh", "ja", "en"], # List[str], 可选, 支持的语言列表,覆盖 languages
"bert_base_path": "/path/to/bert", # str, 可选, BERT 模型路径,覆盖 bert_base_path
"cnhuhbert_base_path": "/path/to/hubert", # str, 可选, HuBERT 模型路径,覆盖 cnhuhbert_base_path
"device": "cpu", # str, 可选, 统一设备,覆盖 device
"is_half": true, # bool, 可选, 是否使用半精度,覆盖 is_half
"t2s_weights_path": "/path/to/gpt.ckpt", # str, 可选, GPT 模型路径,覆盖 t2s_weights_path
"vits_weights_path": "/path/to/sovits.pth", # str, 可选, SoVITS 模型路径,覆盖 vits_weights_path
"t2s_model_path": "/path/to/gpt.ckpt", # str, 可选, GPT 模型路径(与 t2s_weights_path 同义)
"t2s_model_device": "cpu", # str, 可选, GPT 模型设备,覆盖 t2s_model.device,默认检测显卡
"vits_model_path": "/path/to/sovits.pth", # str, 可选, SoVITS 模型路径(与 vits_weights_path 同义)
"vits_model_device": "cpu" # str, 可选, SoVITS 模型设备,覆盖 vits_model.device,默认检测显卡
}
参数必要性和优先级
目录结构
GPT-SoVITS-roleapi/
├── api_role_v3.py # 本文件, API 主程序
├── GPT_SoVITS/ # GPT-SoVITS 核心库
│ └── configs/
│ └── tts_infer.yaml # 默认配置文件
├── roles/ # 角色配置目录
│ ├── role1/ # 示例角色 role1
│ │ ├── tts_infer.yaml # 角色配置文件(可选)
│ │ ├── model.ckpt # GPT 模型(可选)
│ │ ├── model.pth # SoVITS 模型(可选)
│ │ └── reference_audios/ # 角色参考音频目录
│ │ ├── zh/
│ │ │ ├── 【开心】voice1.wav
│ │ │ ├── 【开心】voice1.txt
│ │ ├── ja/
│ │ │ ├── 【开心】voice2.wav
│ │ │ ├── 【开心】voice2.txt
│ ├── role2/
│ │ ├── tts_infer.yaml
│ │ ├── model.ckpt
│ │ ├── model.pth
│ │ └── reference_audios/
│ │ ├── zh/
│ │ │ ├── 【开心】voice1.wav
│ │ │ ├── 【开心】voice1.txt
│ │ │ ├── 【悲伤】asdafasdas.wav
│ │ │ ├── 【悲伤】asdafasdas.txt
│ │ ├── ja/
│ │ │ ├── 【开心】voice2.wav
│ │ │ ├── 【开心】voice2.txt
text_lang, prompt_lang, prompt_text 选择逻辑 (/ttsrole)
讲解