@@ -308,43 +308,58 @@ Home Assistant uses the Wyoming Protocol integration to communicate with the Wyo
308308` ` ` mermaid
309309sequenceDiagram
310310 participant HA as Home Assistant
311- participant WY as wyoming_openai
312- participant OAPI as OpenAI API
313- Note over HA,OAPI: Speech-to-Text (STT/ASR) Flow
314- HA->> WY: Transcribe event
315- HA->> WY: AudioStart event
316- loop Audio Streaming
317- HA->> WY: AudioChunk events
318- Note over WY: Buffers WAV data
311+ participant WY as wyoming_openai Proxy
312+ participant OAPI as OpenAI-Compatible API
313+
314+ Note over HA,OAPI: ** Speech-to-Text (STT/ASR) Flow**
315+ HA->> WY: Transcribe event (initiate transcription)
316+ HA->> WY: AudioStart event (begin sending audio)
317+ loop While capturing microphone audio
318+ HA->> WY: AudioChunk events (WAV data)
319+ Note over WY: Accumulates/buffers WAV PCM chunks
319320 end
320- HA->> WY: AudioStop event
321+ HA->> WY: AudioStop event (end of input)
321322
322323 alt Non-Streaming Transcription
323- WY->> OAPI: Send complete audio file
324- OAPI-->> WY: Text transcript response
324+ WY->> OAPI: Upload complete audio file
325+ OAPI-->> WY: Full text transcript
325326 WY->> HA: TranscriptStart event
326- WY->> HA: Transcript event (complete text)
327+ WY->> HA: Transcript event (full text result )
327328 WY->> HA: TranscriptStop event
328329 else Streaming Transcription
329- WY->> OAPI: Send audio file with stream=true
330+ WY->> OAPI: Send audio with ` stream=true`
330331 WY->> HA: TranscriptStart event
331- loop
332- OAPI-->> WY: Transcript chunk delta
333- WY-->> HA: TranscriptChunk event (partial text)
332+ loop As partial results are returned
333+ OAPI-->> WY: Transcript delta (partial text)
334+ WY-->> HA: TranscriptChunk event
334335 end
335- WY->> HA: Transcript event (complete text)
336+ WY->> HA: Transcript event (final text)
336337 WY->> HA: TranscriptStop event
337338 end
338339
339- Note over HA,OAPI: Text-to-Speech (TTS) Flow
340- HA->> WY: Synthesize event (text + voice)
341- WY->> OAPI: Speech synthesis request
342- WY->> HA: AudioStart event
343- loop Audio Streaming
344- OAPI-->> WY: Audio stream chunks
345- WY-->> HA: AudioChunk events
340+ Note over HA,OAPI: ** Text-to-Speech (TTS) Flow**
341+
342+ alt Non-Streaming TTS (Synthesize)
343+ HA->> WY: Synthesize event (text + voice)
344+ WY->> OAPI: Speech synthesis request
345+ WY->> HA: AudioStart event
346+ loop While receiving audio data
347+ OAPI-->> WY: Audio stream chunks
348+ WY-->> HA: AudioChunk events
349+ end
350+ WY->> HA: AudioStop event
351+ else Streaming TTS (SynthesizeStart/Chunk/Stop)
352+ HA->> WY: SynthesizeStart event (voice config)
353+ Note over WY: Initialize synthesis buffer
354+ loop Sending text chunks
355+ HA->> WY: SynthesizeChunk events
356+ Note over WY: Append to synthesis buffer
357+ end
358+ HA->> WY: SynthesizeStop event
359+ Note over WY: No-op — OpenAI ` /v1/audio/speech` < br/> does not support streaming text input
360+ WY->> HA: SynthesizeStopped event
361+ Note over WY: Streaming flow is handled< br/> but not advertised in capabilities
346362 end
347- WY->> HA: AudioStop event
348363` ` `
349364
350365# ### Open WebUI
0 commit comments