Skip to content

Commit c486c82

Browse files
committed
Enhance README: Clarify sequence diagrams for STT and TTS flows, improving readability and detail
1 parent da8e942 commit c486c82

File tree

1 file changed

+40
-25
lines changed

1 file changed

+40
-25
lines changed

README.md

Lines changed: 40 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -308,43 +308,58 @@ Home Assistant uses the Wyoming Protocol integration to communicate with the Wyo
308308
```mermaid
309309
sequenceDiagram
310310
participant HA as Home Assistant
311-
participant WY as wyoming_openai
312-
participant OAPI as OpenAI API
313-
Note over HA,OAPI: Speech-to-Text (STT/ASR) Flow
314-
HA->>WY: Transcribe event
315-
HA->>WY: AudioStart event
316-
loop Audio Streaming
317-
HA->>WY: AudioChunk events
318-
Note over WY: Buffers WAV data
311+
participant WY as wyoming_openai Proxy
312+
participant OAPI as OpenAI-Compatible API
313+
314+
Note over HA,OAPI: **Speech-to-Text (STT/ASR) Flow**
315+
HA->>WY: Transcribe event (initiate transcription)
316+
HA->>WY: AudioStart event (begin sending audio)
317+
loop While capturing microphone audio
318+
HA->>WY: AudioChunk events (WAV data)
319+
Note over WY: Accumulates/buffers WAV PCM chunks
319320
end
320-
HA->>WY: AudioStop event
321+
HA->>WY: AudioStop event (end of input)
321322
322323
alt Non-Streaming Transcription
323-
WY->>OAPI: Send complete audio file
324-
OAPI-->>WY: Text transcript response
324+
WY->>OAPI: Upload complete audio file
325+
OAPI-->>WY: Full text transcript
325326
WY->>HA: TranscriptStart event
326-
WY->>HA: Transcript event (complete text)
327+
WY->>HA: Transcript event (full text result)
327328
WY->>HA: TranscriptStop event
328329
else Streaming Transcription
329-
WY->>OAPI: Send audio file with stream=true
330+
WY->>OAPI: Send audio with `stream=true`
330331
WY->>HA: TranscriptStart event
331-
loop
332-
OAPI-->>WY: Transcript chunk delta
333-
WY-->>HA: TranscriptChunk event (partial text)
332+
loop As partial results are returned
333+
OAPI-->>WY: Transcript delta (partial text)
334+
WY-->>HA: TranscriptChunk event
334335
end
335-
WY->>HA: Transcript event (complete text)
336+
WY->>HA: Transcript event (final text)
336337
WY->>HA: TranscriptStop event
337338
end
338339
339-
Note over HA,OAPI: Text-to-Speech (TTS) Flow
340-
HA->>WY: Synthesize event (text + voice)
341-
WY->>OAPI: Speech synthesis request
342-
WY->>HA: AudioStart event
343-
loop Audio Streaming
344-
OAPI-->>WY: Audio stream chunks
345-
WY-->>HA: AudioChunk events
340+
Note over HA,OAPI: **Text-to-Speech (TTS) Flow**
341+
342+
alt Non-Streaming TTS (Synthesize)
343+
HA->>WY: Synthesize event (text + voice)
344+
WY->>OAPI: Speech synthesis request
345+
WY->>HA: AudioStart event
346+
loop While receiving audio data
347+
OAPI-->>WY: Audio stream chunks
348+
WY-->>HA: AudioChunk events
349+
end
350+
WY->>HA: AudioStop event
351+
else Streaming TTS (SynthesizeStart/Chunk/Stop)
352+
HA->>WY: SynthesizeStart event (voice config)
353+
Note over WY: Initialize synthesis buffer
354+
loop Sending text chunks
355+
HA->>WY: SynthesizeChunk events
356+
Note over WY: Append to synthesis buffer
357+
end
358+
HA->>WY: SynthesizeStop event
359+
Note over WY: No-op — OpenAI `/v1/audio/speech`<br/>does not support streaming text input
360+
WY->>HA: SynthesizeStopped event
361+
Note over WY: Streaming flow is handled<br/>but not advertised in capabilities
346362
end
347-
WY->>HA: AudioStop event
348363
```
349364

350365
#### Open WebUI

0 commit comments

Comments
 (0)