Utilizes ONNX Runtime for TTS model.
-
Supported Models:
-
End-to-End Processing:
- The solution includes internal
STFT/ISTFT
processing. - Input:
reference audio
+text
- Output:
generated speech
- The solution includes internal
-
Optimize:
- The key components enable 100% deployment of GPU operators.
-
Resources:
OS | Device | Backend | Model | Time Cost in Seconds (reference audio: 6s / generates approximately 15 words of speech) |
---|---|---|---|---|
Ubuntu-24.04 | Laptop | CPU i7-1165G7 |
F5-TTS F32 |
180 (NFE=32) |
Ubuntu-24.04 | Laptop | GPU MX150 |
F5-TTS F32 |
62 (NFE=32) |
Ubuntu-24.04 | Laptop | CPU i7-1165G7 |
IndexTTS F32 |
18 |
Ubuntu-24.04 | Laptop | GPU MX150 |
BigVGAN V2 24khz_100band_256x F16 |
4.6 input mel = (1, 100, 512) |
- Beam Search
- MOSS-TTSD
通过 ONNX Runtime 实现运行 TTS 模型。
-
支持的模型:
-
端到端处理:
- 解决方案内置
STFT/ISTFT
处理。 - 输入:
参考音频
+文本
- 输出:
生成的语音
- 解决方案内置
-
优化:
- 模型关键组件实现了 100% GPU 算子部署。
-
资源: