-
-
Notifications
You must be signed in to change notification settings - Fork 856
Description
Hello,
First, thank you so much for creating and maintaining the edge-tts library!
My goal is to create audio with word-by-word synchronized subtitles. I have successfully installed the latest version of the library directly from the GitHub master branch.
However, I've found that while I can successfully stream audio and receive SentenceBoundary events, the stream does not seem to contain any WordBoundary events. This makes it impossible to generate the word-level subtitles I'm hoping for.
I'm writing to ask if this is the current expected behavior, or perhaps if I'm missing a step to properly enable the word-level data. Is this a known issue, or is there a specific method to request WordBoundary events in the current version?
To provide clear context, I will attach the simple diagnostic script I used for testing, as well as its complete output log, which confirms the absence of WordBoundary chunks.
Here's the log: --- Starting WordBoundary Diagnostic Test ---
Using Voice: en-US-JennyNeural
Synthesizing Text: "Hello world. This is a simple test to see the word boundary events."
[Log] Connecting to server and receiving stream...
[INFO] Received SentenceBoundary chunk: {'type': 'SentenceBoundary', 'offset': 500000, 'duration': 16375000, 'text': 'Hello world.'}
[INFO] Received SentenceBoundary chunk: {'type': 'SentenceBoundary', 'offset': 16875000, 'duration': 38250000, 'text': 'This is a simple test to see the word boundary events.'}
[Log] Stream finished.
Diagnostic audio file saved to: diagnostic_audio.mp3
--- Diagnostic Result ---
❌ FAILURE: No 'WordBoundary' events were received.
This would indicate a persistent issue with the library or the service connection.