You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update README: Highlight incremental TTS streaming with sentence boundary chunking
- Add mention of TTS streaming compatibility in overview section
- Add new objective about streaming compatibility bridging Wyoming and OpenAI protocols
- Update sequence diagram to show incremental synthesis with pysbd sentence detection
- Emphasize responsive audio delivery through intelligent text chunking
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+17-6Lines changed: 17 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ Note: This project is not affiliated with OpenAI or the Wyoming project.
10
10
11
11
## Overview
12
12
13
-
This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant.
13
+
This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant. The proxy now provides incremental TTS streaming compatibility by intelligently chunking text at sentence boundaries for responsive audio delivery.
14
14
15
15
## Featured Models
16
16
@@ -28,7 +28,8 @@ This project features a variety of examples for using cutting-edge models in bot
28
28
2.**Service Consolidation**: Allow users of various programs to run inference on a single server without needing separate instances for each service.
29
29
Example: Sharing TTS/STT services between [Open WebUI](#open-webui) and [Home Assistant](#usage-in-home-assistant).
30
30
3.**Asynchronous Processing**: Enable efficient handling of multiple requests by supporting asynchronous processing of audio streams.
31
-
4.**Simple Setup with Docker**: Provide a straightforward deployment process using [Docker and Docker Compose](#docker-recommended) for OpenAI and various popular open source projects.
31
+
4.**Streaming Compatibility**: Bridge Wyoming's streaming TTS protocol with OpenAI-compatible APIs through intelligent sentence boundary chunking, enabling responsive incremental audio delivery even when the underlying API doesn't support streaming text input.
32
+
5.**Simple Setup with Docker**: Provide a straightforward deployment process using [Docker and Docker Compose](#docker-recommended) for OpenAI and various popular open source projects.
32
33
33
34
## Terminology
34
35
@@ -354,15 +355,25 @@ sequenceDiagram
354
355
WY->>HA: AudioStop event
355
356
else Streaming TTS (SynthesizeStart/Chunk/Stop)
356
357
HA->>WY: SynthesizeStart event (voice config)
357
-
Note over WY: Initialize synthesis buffer
358
+
Note over WY: Initialize incremental synthesis<br/>with sentence boundary detection
359
+
WY->>HA: AudioStart event
358
360
loop Sending text chunks
359
361
HA->>WY: SynthesizeChunk events
360
-
Note over WY: Append to synthesis buffer
362
+
Note over WY: Accumulate text and detect<br/>complete sentences using pysbd
363
+
alt Complete sentences detected
364
+
loop For each complete sentence
365
+
WY->>OAPI: Speech synthesis request
366
+
loop While receiving audio data
367
+
OAPI-->>WY: Audio stream chunks
368
+
WY-->>HA: AudioChunk events (incremental)
369
+
end
370
+
end
371
+
end
361
372
end
362
373
HA->>WY: SynthesizeStop event
363
-
Note over WY: No-op — OpenAI `/v1/audio/speech`<br/>does not support streaming text input
374
+
Note over WY: Process any remaining text<br/>and finalize synthesis
375
+
WY->>HA: AudioStop event
364
376
WY->>HA: SynthesizeStopped event
365
-
Note over WY: Streaming flow is handled<br/>but not advertised in capabilities
0 commit comments