Realtime/Fastest way to generate stable voice audio locally #882
Unanswered
lev-laptinov
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
Can you specify which API endpoint you are using? Because http://LINK/v1/tts has "Content-Type": "application/msgpack". Also, is the inference speed you mentioned measured after generating audio from a chunk or before? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to play generated audio as fast as it possible from my pod, i'm using runpod.io where i run docker image with github repo with start comand
python tools/api_server.py --llama-checkpoint-path checkpoints/fish-speech-1.5 --decoder-checkpoint-path checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth --listen 0.0.0.0:8080 --compileThen i run the request to this pod like:
As i understand there is no other option to get stable voice apart from use reference audio
Abour streaming as i understand it returns generated chunks as it are generated:
I've tried to use it using:
but i didn't get any difference, it starts playing the same time as the whole audio is written
I also used use_memory_cache, it gives increase in speed
I have also tried to finetune it, i increased the time from 6 to 4 secs for audio ~13sec
Now when i run it on 2xRTX 4090 with --compile i get smth ~10sec audio per ~3sec
So like main enhancement as i think is streaming, is it possible to stream audio?
Also maybe i use or understand smth wrong?
Beta Was this translation helpful? Give feedback.
All reactions