add timestamps for each word #113
Replies: 2 comments
-
For sure! I was planning on jumping on it once I finished the v1_0 integrations (the structure may change somewhat for those models anyhow). But you can take a look at the stale branch I was using to experiment with it a bit. You can get the pred_dur from the pytorch versions (not sure how you'd do it with onnx tbh), and then matching that back through the phonemes/tokens back to words. Was a bit tricky with the sampling and scaling/etc which is where I left it https://github.com/remsky/Kokoro-FastAPI/tree/v0.1.2-pre-experimental-subs |
Beta Was this translation helpful? Give feedback.
-
word level timestamps are currently supported by using the dev api (/dev/captioned_speech). This pull adds support for streaming word level time stamps (it is in a different format so just look in the examples in the readme.md) #173 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to have timestamps for each word in the generated text-to-speech output. This would improve the accuracy of syncing the audio with other media.
I could also submit this as a PR if I get some guidance.
Beta Was this translation helpful? Give feedback.
All reactions