推理部分，增加gradio的wav音频流式输出演示 #448

upbit · 2024-02-09T14:25:22Z

新年快乐！

看到 #291 的issue，之前有过一个类似代码，于是今天参考GUI写了个流式输出的demo。

对inference_webui.py有2处改动：

get_tts_wav 增加一个stream参数（默认False不影响原逻辑），开启后按text推理bytes的结果；
增加if __name__ == '__main__':避免从inference_webui.py引用函数时启动gradio app；

upbit · 2024-02-11T07:08:01Z

合入最新main的代码

v3ucn · 2024-02-19T01:57:48Z

请问您所使用的gradio版本是多少的？我看audio控件没有声明streaming=True，我用这个分支跑了一下，并没有流式输出的效果。

upbit · 2024-02-19T03:10:24Z

请问您所使用的gradio版本是多少的？我看audio控件没有声明streaming=True，我用这个分支跑了一下，并没有流式输出的效果。

gradio==4.17.0，需要大于这个版本才支持这个特性。官方例子：stream_audio_out/run.py

启动这个WebUI包装：python GPT_SoVITS/inference_stream.py

ps: 我用的是MacOS录屏没有声音，晚点用Windows验证下。可以用example里长一些的文本，实测Mac需要差不多两句后才开始自动播放

v3ucn · 2024-02-19T03:53:48Z

请问您所使用的gradio版本是多少的？我看audio控件没有声明streaming=True，我用这个分支跑了一下，并没有流式输出的效果。

gradio==4.17.0，需要大于这个版本才支持这个特性。官方例子：stream_audio_out/run.py

启动这个WebUI包装：python GPT_SoVITS/inference_stream.py

ps: 我用的是MacOS录屏没有声音，晚点用Windows验证下。可以用example里长一些的文本，实测Mac需要差不多两句后才开始自动播放（视频未加速）

output.mp4

感谢，我用mac os试下，非常赞的一边推理一边播放的流式效果

upbit · 2024-02-19T04:54:36Z

请问您所使用的gradio版本是多少的？我看audio控件没有声明streaming=True，我用这个分支跑了一下，并没有流式输出的效果。

gradio==4.17.0，需要大于这个版本才支持这个特性。官方例子：stream_audio_out/run.py
启动这个WebUI包装：python GPT_SoVITS/inference_stream.py
ps: 我用的是MacOS录屏没有声音，晚点用Windows验证下。可以用example里长一些的文本，实测Mac需要差不多两句后才开始自动播放（视频未加速）
output.mp4

感谢，我用mac os试下，非常赞的一边推理一边播放的流式效果

手边没有windows的机器，晚点我验证完，等主干稳定后再push一版。也可以看看有没有其他要调整的CR点

v3ucn · 2024-02-20T03:19:38Z

我这俩天用macos测试官方的demo:https://huggingface.co/spaces/gradio/stream_audio_out

还是没有效果，很怪异

2.20.mp4

upbit · 2024-02-20T05:32:56Z

你试试 stream as file 后，点击播放按钮试试。看他代码没有加自动播放

leiyuyh · 2024-02-20T12:25:13Z

你试试 stream as file 后，点击播放按钮试试。看他代码没有加自动播放

期待大佬的视频效果。如果有流式以及非流式的对比就更好了

v3ucn · 2024-02-20T12:27:39Z

你试试 stream as file 后，点击播放按钮试试。看他代码没有加自动播放

可以了，应该是浏览器版本问题导致的，对浏览器版本要求比较特别，Windows还是不支持

upbit · 2024-02-20T12:57:57Z

你试试 stream as file 后，点击播放按钮试试。看他代码没有加自动播放

可以了，应该是浏览器版本问题导致的，对浏览器版本要求比较特别，Windows还是不支持

是的。最开始调这个时，发现Chrome Windows版本对音频自动播放做了限制，得单独打开
https://developer.chrome.com/blog/autoplay

upbit · 2024-02-28T08:13:26Z

重新合并了下上游的变更：

默认WebUI模式，演示流式音频推理（自动播放需要Mac的Chrome）：is_half=False python GPT_SoVITS/inference_stream.py
支持流式API模式（uvicorn）：is_half=False python GPT_SoVITS/inference_stream.py --api

leiyuyh · 2024-02-29T13:01:39Z

请问如何验证效果，代码已经同步

leiyuyh · 2024-02-29T13:16:23Z

这个功能感觉很有用

upbit · 2024-03-01T06:04:29Z

请问如何验证效果，代码已经同步

MacOS上运行：is_half=False python GPT_SoVITS/inference_stream.py
Windows上直接运行：python GPT_SoVITS/inference_stream.py

长文本案例下面example部分有，另外Windows下Chrome推理开始后，可能需要点播放才能开始听结果

leiyuyh · 2024-03-01T10:01:37Z

请问如何验证效果，代码已经同步

MacOS上运行：is_half=False python GPT_SoVITS/inference_stream.py

Windows上直接运行：python GPT_SoVITS/inference_stream.py

长文本案例下面example部分有，另外Windows下Chrome推理开始后，可能需要点播放才能开始听结果

1.需要手动安装一下gradio4.17.0或以上版本。
2. stream.py需要修改launch方法：
app.launch(
server_name="0.0.0.0",
inbrowser=True,
share=True,
server_port=infer_ttswebui,
quiet=True,
max_threads=511 # 仍然可以尝试设置最大线程数
)

最后生成流式音频巨慢。 4090也慢，不晓得为啥。体会不到流式效果

upbit · 2024-03-10T03:45:13Z

请问如何验证效果，代码已经同步

MacOS上运行：is_half=False python GPT_SoVITS/inference_stream.py

Windows上直接运行：python GPT_SoVITS/inference_stream.py

长文本案例下面example部分有，另外Windows下Chrome推理开始后，可能需要点播放才能开始听结果

1.需要手动安装一下gradio4.17.0或以上版本。 2. stream.py需要修改launch方法： app.launch( server_name="0.0.0.0", inbrowser=True, share=True, server_port=infer_ttswebui, quiet=True, max_threads=511 # 仍然可以尝试设置最大线程数 )

最后生成流式音频巨慢。 4090也慢，不晓得为啥。体会不到流式效果

验证了下，Windows下gradio的streaming确实有问题，这个是测试页面：

https://huggingface.co/spaces/gradio/stream_audio_out

测试了Chrome 122.0和Edge，无论推文件还是Bytes都无法正常播放。可能需要换个其他的wave控件

upbit · 2024-03-10T08:32:39Z

Windows下目前只能用API模式，运行方法：

python GPT_SoVITS/inference_stream.py --api
# 访问 http://localhost:5000?text=<测试文本>

Windows下流式推理录屏（Chrome不会自动播放，需要等推理完2句后点下播放）：

win_streaming.mp4

补充：Gradio的音频控件，Windows下设置 autoplay=False 就可以类似视频里的流式播放（不然Chrome下会报 The AudioContext was not allowed to start. It must be resumed (or created) after a user gesture on the page）

wawaa · 2024-03-12T09:24:28Z

这个应该是切分了句子之后，按句子级别进行输出的流式？

upbit · 2024-03-12T14:32:10Z

这个应该是切分了句子之后，按句子级别进行输出的流式？

对，默认用的是按标点符号切割。

新的推理加速分支，加入了分片的yield，这里等flash attention等改动合入后，我再提一版（避免冲突）。如果测试可以先用这个分支的代码：https://github.com/upbit/GPT-SoVITS/tree/streaming

如果不是MacOS推理，Windows目前会因为gradio的Audio请求一直挂起（直到推理结束），因此暂时只能在API模式下实现视频里流式播放效果。

wawaa · 2024-03-13T06:24:07Z

这个应该是切分了句子之后，按句子级别进行输出的流式？

对，默认用的是按标点符号切割。

新的推理加速分支，加入了分片的yield，这里等flash attention等改动合入后，我再提一版（避免冲突）。如果测试可以先用这个分支的代码：https://github.com/upbit/GPT-SoVITS/tree/streaming

如果不是MacOS推理，Windows目前会因为gradio的Audio请求一直挂起（直到推理结束），因此暂时只能在API模式下实现视频里流式播放效果。

好的，已经看到流式推理的效果了

AZYoung233 · 2024-04-19T06:43:29Z

您好，请问如何在服务器上部署流式API，我直接使用stream_api.py，会无报错：

Traceback (most recent call last):
File "/home/ubuntu/0307/gpt_sovits/stream_api.py", line 216, in
dict_s1 = torch.load(gpt_path, map_location="cpu")
File "/root/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/serialization.py", line 993, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/root/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/serialization.py", line 447, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

HqWu-HITCS · 2024-04-24T07:47:24Z

请问如何验证效果，代码已经同步

MacOS上运行：is_half=False python GPT_SoVITS/inference_stream.py

Windows上直接运行：python GPT_SoVITS/inference_stream.py

长文本案例下面example部分有，另外Windows下Chrome推理开始后，可能需要点播放才能开始听结果

大佬您好，参考大佬给出的方式实现了流式处理，但是我这边发现一个问题，流式输出后会在每一段的语音最后有个短暂的类似于的爆音的杂音，不知道大佬有没有遇到过，有一些排查的思路吗？感谢感谢

upbit · 2024-04-24T07:51:55Z

大佬您好，参考大佬给出的方式实现了流式处理，但是我这边发现一个问题，流式输出后会在每一段的语音最后有个短暂的类似于的爆音的杂音，不知道大佬有没有遇到过，有一些排查的思路吗？感谢感谢

没有遇到过类似情况，是最新分支吗？我晚点测试看看

记得代码里每个batch，推理音频后会附加一个zero_wav用于停顿，你可以去掉试试：
audio_fragment = torch.cat([audio_fragment, zero_wav], dim=0)

upbit · 2024-04-27T07:19:45Z

长文本案例下面example部分有，另外Windows下Chrome推理开始后，可能需要点播放才能开始听结果

大佬您好，参考大佬给出的方式实现了流式处理，但是我这边发现一个问题，流式输出后会在每一段的语音最后有个短暂的类似于的爆音的杂音，不知道大佬有没有遇到过，有一些排查的思路吗？感谢感谢

merge了 fast_inference_ 分支的一些新变更，你可以pull下 streaming 分支的最新代码试试（用Windows验证了下没有遇到爆音问题）

HqWu-HITCS · 2024-04-27T14:14:13Z

长文本案例下面example部分有，另外Windows下Chrome推理开始后，可能需要点播放才能开始听结果

大佬您好，参考大佬给出的方式实现了流式处理，但是我这边发现一个问题，流式输出后会在每一段的语音最后有个短暂的类似于的爆音的杂音，不知道大佬有没有遇到过，有一些排查的思路吗？感谢感谢

merge了 fast_inference_ 分支的一些新变更，你可以pull下 streaming 分支的最新代码试试（用Windows验证了下没有遇到爆音问题）

感谢大佬，已经解决

phpmaple · 2024-06-18T06:37:18Z

这个能不能不按照逗号切分，而是真的流式，比如输出一个token变成音频，之后继续推理输出token到音频。。。。。

upbit force-pushed the stream_out branch from ffdee42 to b522fae Compare February 11, 2024 07:05

upbit force-pushed the stream_out branch from b522fae to 5e27552 Compare February 16, 2024 13:24

推理部分，增加gradio的wav音频流式输出与TTS API

a8d59e7

upbit force-pushed the stream_out branch from 5e27552 to a8d59e7 Compare February 28, 2024 08:10

X-T-E-R mentioned this pull request Mar 6, 2024

各位佬，这个有可能做成流式实时合成语音吗 #637

Open

fix: Fix i18n name for Examples

a82f613

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

推理部分，增加gradio的wav音频流式输出演示 #448

推理部分，增加gradio的wav音频流式输出演示 #448

upbit commented Feb 9, 2024

upbit commented Feb 11, 2024

v3ucn commented Feb 19, 2024

upbit commented Feb 19, 2024 •

edited

Loading

v3ucn commented Feb 19, 2024

upbit commented Feb 19, 2024

v3ucn commented Feb 20, 2024

upbit commented Feb 20, 2024

leiyuyh commented Feb 20, 2024

v3ucn commented Feb 20, 2024

upbit commented Feb 20, 2024

upbit commented Feb 28, 2024

leiyuyh commented Feb 29, 2024

leiyuyh commented Feb 29, 2024

upbit commented Mar 1, 2024

leiyuyh commented Mar 1, 2024

upbit commented Mar 10, 2024

upbit commented Mar 10, 2024 •

edited

Loading

wawaa commented Mar 12, 2024

upbit commented Mar 12, 2024

wawaa commented Mar 13, 2024

AZYoung233 commented Apr 19, 2024

HqWu-HITCS commented Apr 24, 2024

upbit commented Apr 24, 2024

upbit commented Apr 27, 2024

HqWu-HITCS commented Apr 27, 2024

phpmaple commented Jun 18, 2024

推理部分，增加gradio的wav音频流式输出演示 #448

Are you sure you want to change the base?

推理部分，增加gradio的wav音频流式输出演示 #448

Conversation

upbit commented Feb 9, 2024

upbit commented Feb 11, 2024

v3ucn commented Feb 19, 2024

upbit commented Feb 19, 2024 • edited Loading

v3ucn commented Feb 19, 2024

upbit commented Feb 19, 2024

v3ucn commented Feb 20, 2024

upbit commented Feb 20, 2024

leiyuyh commented Feb 20, 2024

v3ucn commented Feb 20, 2024

upbit commented Feb 20, 2024

upbit commented Feb 28, 2024

leiyuyh commented Feb 29, 2024

leiyuyh commented Feb 29, 2024

upbit commented Mar 1, 2024

leiyuyh commented Mar 1, 2024

upbit commented Mar 10, 2024

upbit commented Mar 10, 2024 • edited Loading

wawaa commented Mar 12, 2024

upbit commented Mar 12, 2024

wawaa commented Mar 13, 2024

AZYoung233 commented Apr 19, 2024

HqWu-HITCS commented Apr 24, 2024

upbit commented Apr 24, 2024

upbit commented Apr 27, 2024

HqWu-HITCS commented Apr 27, 2024

phpmaple commented Jun 18, 2024

upbit commented Feb 19, 2024 •

edited

Loading

upbit commented Mar 10, 2024 •

edited

Loading