Skip to content

Commit 49ded69

Browse files
roryeckelclaude
andcommitted
Update README: Highlight incremental TTS streaming with sentence boundary chunking
- Add mention of TTS streaming compatibility in overview section - Add new objective about streaming compatibility bridging Wyoming and OpenAI protocols - Update sequence diagram to show incremental synthesis with pysbd sentence detection - Emphasize responsive audio delivery through intelligent text chunking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 1743cd9 commit 49ded69

File tree

1 file changed

+17
-6
lines changed

1 file changed

+17
-6
lines changed

README.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Note: This project is not affiliated with OpenAI or the Wyoming project.
1010

1111
## Overview
1212

13-
This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant.
13+
This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant. The proxy now provides incremental TTS streaming compatibility by intelligently chunking text at sentence boundaries for responsive audio delivery.
1414

1515
## Featured Models
1616

@@ -28,7 +28,8 @@ This project features a variety of examples for using cutting-edge models in bot
2828
2. **Service Consolidation**: Allow users of various programs to run inference on a single server without needing separate instances for each service.
2929
Example: Sharing TTS/STT services between [Open WebUI](#open-webui) and [Home Assistant](#usage-in-home-assistant).
3030
3. **Asynchronous Processing**: Enable efficient handling of multiple requests by supporting asynchronous processing of audio streams.
31-
4. **Simple Setup with Docker**: Provide a straightforward deployment process using [Docker and Docker Compose](#docker-recommended) for OpenAI and various popular open source projects.
31+
4. **Streaming Compatibility**: Bridge Wyoming's streaming TTS protocol with OpenAI-compatible APIs through intelligent sentence boundary chunking, enabling responsive incremental audio delivery even when the underlying API doesn't support streaming text input.
32+
5. **Simple Setup with Docker**: Provide a straightforward deployment process using [Docker and Docker Compose](#docker-recommended) for OpenAI and various popular open source projects.
3233

3334
## Terminology
3435

@@ -354,15 +355,25 @@ sequenceDiagram
354355
WY->>HA: AudioStop event
355356
else Streaming TTS (SynthesizeStart/Chunk/Stop)
356357
HA->>WY: SynthesizeStart event (voice config)
357-
Note over WY: Initialize synthesis buffer
358+
Note over WY: Initialize incremental synthesis<br/>with sentence boundary detection
359+
WY->>HA: AudioStart event
358360
loop Sending text chunks
359361
HA->>WY: SynthesizeChunk events
360-
Note over WY: Append to synthesis buffer
362+
Note over WY: Accumulate text and detect<br/>complete sentences using pysbd
363+
alt Complete sentences detected
364+
loop For each complete sentence
365+
WY->>OAPI: Speech synthesis request
366+
loop While receiving audio data
367+
OAPI-->>WY: Audio stream chunks
368+
WY-->>HA: AudioChunk events (incremental)
369+
end
370+
end
371+
end
361372
end
362373
HA->>WY: SynthesizeStop event
363-
Note over WY: No-op — OpenAI `/v1/audio/speech`<br/>does not support streaming text input
374+
Note over WY: Process any remaining text<br/>and finalize synthesis
375+
WY->>HA: AudioStop event
364376
WY->>HA: SynthesizeStopped event
365-
Note over WY: Streaming flow is handled<br/>but not advertised in capabilities
366377
end
367378
```
368379

0 commit comments

Comments
 (0)