请问为什么custom中替换的模型没有效果 #2342

jyfjyt · 2025-04-29T08:58:00Z

t2s_weights_path和vits_weights_path替换了几个acgnai网站的角色模型。但是实际效果更趋向于最终的 ref_audio_path+prompt_text的音色

比如我用的【A】角色训练的模型:

t2s_weights_path:GPT_weights_v4/黄泉-e10.ckpt
vits_weights_path: SoVITS_weights_v4/黄泉_e10_s930.pth

但是ref_audio_path+prompt_text用的是【B】角色的5秒音频,比如:

   "ref_audio_path": "Sample/eula/优菈.wav",
   "prompt_text": "行啊，赢得干净利落。有空的话…我们再来几局？",

最终推理出来的语音基本和【B】角色一样。

和用了默认的模型配置没区别:

  t2s_weights_path: GPT_SoVITS/pretrained_models/s1v3.ckpt
  version: v3
  vits_weights_path: GPT_SoVITS/pretrained_models/s2Gv3.pth

最终效果还是看ref_audio_path的输入。

那训练模型的作用是什么？还是说我使用模型的方式用问题？我用的是api_v2启动的

The text was updated successfully, but these errors were encountered:

foreverhell · 2025-04-29T11:00:10Z

像参考音频的音色是正常的，这是模型的自身能力；微调模型是为了在细节上更像需要的说话人，如情绪、语言风格、发声习惯、口癖等

jyfjyt · 2025-04-29T15:46:22Z

像参考音频的音色是正常的，这是模型的自身能力；微调模型是为了在细节上更像需要的说话人，如情绪、语言风格、发声习惯、口癖等

谢谢讲解。

但我记得当初GPTSovit老版本是通过很多语音直接训练出模型，就是那种不需要参考音频，也不需要参考文字的方式，直接用训练好的模型就能直接推理了。
这种方式是已经淘汰了吗？以前的虽然训练需要语料多，但是推理操作使用感觉更方便，直接加载模型就行了，不用关心提示文字啥的。
现在每次推理都要指定参考音频和提示文字，现在感觉就和cosyvoice的3秒克隆很类似，这种是不是都要先算一遍然后再推理，不会影响生成速度吗？

jyfjyt closed this as completed May 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

请问为什么custom中替换的模型没有效果 #2342

请问为什么custom中替换的模型没有效果 #2342

jyfjyt commented Apr 29, 2025

foreverhell commented Apr 29, 2025

Uh oh!

jyfjyt commented Apr 29, 2025

Uh oh!

请问为什么custom中替换的模型没有效果 #2342

请问为什么custom中替换的模型没有效果 #2342

Comments

jyfjyt commented Apr 29, 2025

foreverhell commented Apr 29, 2025

Uh oh!

jyfjyt commented Apr 29, 2025

Uh oh!