-
Notifications
You must be signed in to change notification settings - Fork 5.2k
V4具体怎么在api.py或api_v2.py里使用呢? #2306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The information commented at the top of api_v2.py is valid. GPT_SoVITS/configs/tts_infer.yaml contains the last configuration used when running the webui inference (from webui.py or from inference_webui_fast.py). So if you last ran webui inference with a v4 model it should be ready to go in tts_infer.yaml. ` WebAPI文档python api_v2.py -a 127.0.0.1 -p 9880 -c GPT_SoVITS/configs/tts_infer.yaml 执行参数:
调用:推理endpoint: /tts POST: RESP: 命令控制endpoint: /control command: GET: http://127.0.0.1:9880/control?command=restart POST: RESP: 无 切换GPT模型endpoint: /set_gpt_weights GET: RESP: 切换Sovits模型endpoint: /set_sovits_weights GET: http://127.0.0.1:9880/set_sovits_weights?weights_path=GPT_SoVITS/pretrained_models/s2G488k.pth RESP: ` |
@dignome 感谢回复。我看到了tts_infer.yaml已经更新了v4模型,直接调用api_v2.py的结果合成出来的声音是很奇怪的,像是模型没匹配好。我的tts_infer.yaml如下: |
If there really is a difference you could most likely show this by setting a fixed/static seed value along with matching the other parameters when using both api-v2.py and inference_webui_fast.py. It should produce similar result. For best speaker reproduction you should finetune a v4 model against a dataset containing at least 10 minutes of audio samples of that speaker using webui.py - then make sure those models are present in your config specified to api-v2.py -c <path/to/your/config.yaml> |
感觉奇怪是正常的,因为v4虽然和v3架构相同但是采样率是不一致的,如果你使用和v3相同的api调用参数是必定会出问题的,你可以自行更改相关部分 我目前自己改的处理逻辑如下 --- V3 Mel 函数定义 ---mel_fn = lambda x: mel_spectrogram_torch( --- 添加 V4 Mel 函数定义 ---mel_fn_v4 = lambda x: mel_spectrogram_torch(
但是问题在于pyopenjtalk是炸的,最终调用日语还是会报错,很头疼 |
有比较好的解决方案了吗 |
好像还是修改了tts_infer.yaml,运行api_v2.py会version自动变回v2,生成出来的音频应该是采样率不太对,会炸 |
@wangzai23333 @inktree 是的。我自己也尝试去改tts_infer.yaml、api_v2.py里面相关内容,都没有良好的输出,所以想请花儿大佬完善一版api_v2.py 。:) |
So is your issue resolved? api_v2.py worked for you? |
GPT_SoVITS/TTS_infer_pack/TTS.py 现在的tts_infer.yaml中version不在根层级,所以get不到,用了默认值v2。最简单的办法是把这里改成v4然后提个bug。 |
not yet |
api_v2.py好像改不了sampling_rate,音频出来还是怪得很 |
我看GPT_SoVITS/TTS_infer_pack/TTS.py的using_vocoder_synthesis函数中已经把v4采样率设为32000了吧,如果version为v4,采样率应该是对的啊? |
我看作者说V4采样率是48k啊: “(4)v4修复了v3非整数倍上采样可能导致的电音问题,原生输出48k音频防闷(而v3原生输出只有24k)。作者认为v4是v3的平替,更多还需测试” |
感谢大佬开源的好东西【送花花】,请问v4版本具体是怎么在接口里使用呢?
The text was updated successfully, but these errors were encountered: