You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-7Lines changed: 22 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ Note: This project is not affiliated with OpenAI or the Wyoming project.
10
10
11
11
## Overview
12
12
13
-
This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant.
13
+
This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant. The proxy now provides incremental TTS streaming compatibility by intelligently chunking text at sentence boundaries for responsive audio delivery.
14
14
15
15
## Featured Models
16
16
@@ -28,7 +28,8 @@ This project features a variety of examples for using cutting-edge models in bot
28
28
2.**Service Consolidation**: Allow users of various programs to run inference on a single server without needing separate instances for each service.
29
29
Example: Sharing TTS/STT services between [Open WebUI](#open-webui) and [Home Assistant](#usage-in-home-assistant).
30
30
3.**Asynchronous Processing**: Enable efficient handling of multiple requests by supporting asynchronous processing of audio streams.
31
-
4.**Simple Setup with Docker**: Provide a straightforward deployment process using [Docker and Docker Compose](#docker-recommended) for OpenAI and various popular open source projects.
31
+
4.**Streaming Compatibility**: Bridge Wyoming's streaming TTS protocol with OpenAI-compatible APIs through intelligent sentence boundary chunking, enabling responsive incremental audio delivery even when the underlying API doesn't support streaming text input.
32
+
5.**Simple Setup with Docker**: Provide a straightforward deployment process using [Docker and Docker Compose](#docker-recommended) for OpenAI and various popular open source projects.
| `--tts-speed` | `TTS_SPEED` | None (autodetected) | Speed of the TTS output (ranges from 0.25 to 4.0). |
144
146
| `--tts-instructions` | `TTS_INSTRUCTIONS` | None | Optional instructions for TTS requests (Control the voice). |
147
+
| `--tts-streaming-models` | `TTS_STREAMING_MODELS` | None | Space-separated list of TTS models to enable incremental streaming via pysbd text chunking (e.g. `tts-1`). |
148
+
| `--tts-streaming-min-words` | `TTS_STREAMING_MIN_WORDS` | None | Minimum words per text chunk for incremental TTS streaming (optional). |
149
+
| `--tts-streaming-max-chars` | `TTS_STREAMING_MAX_CHARS` | None | Maximum characters per text chunk for incremental TTS streaming (optional). |
@@ -275,7 +280,7 @@ We follow specific tagging conventions for our Docker images. These tags help in
275
280
276
281
- **`main`**: This tag points to the latest commit on the main code branch. It is suitable for users who want to experiment with the most up-to-date features and changes, but may include unstable or experimental code.
277
282
278
-
- **`major.minor.patch version`**: Specific version tags (e.g., `0.3.6`) correspond to specific stable releases of the Wyoming OpenAI proxy server. These tags are ideal forusers who need a consistent, reproducible environment and want to avoid breaking changes introducedin newer versions.
283
+
- **`major.minor.patch version`**: Specific version tags (e.g., `0.3.7`) correspond to specific stable releases of the Wyoming OpenAI proxy server. These tags are ideal forusers who need a consistent, reproducible environment and want to avoid breaking changes introducedin newer versions.
279
284
280
285
- **`major.minor version`**: Tags that follow the `major.minor` format (e.g., `0.3`) represent a range of patch-level updates within the same minor version series. These tags are useful for users who want to stay updated with bug fixes and minor improvements without upgrading to a new major or minor version.
281
286
@@ -376,15 +381,25 @@ sequenceDiagram
376
381
WY->>HA: AudioStop event
377
382
else Streaming TTS (SynthesizeStart/Chunk/Stop)
378
383
HA->>WY: SynthesizeStart event (voice config)
379
-
Note over WY: Initialize synthesis buffer
384
+
Note over WY: Initialize incremental synthesis<br/>with sentence boundary detection
385
+
WY->>HA: AudioStart event
380
386
loop Sending text chunks
381
387
HA->>WY: SynthesizeChunk events
382
-
Note over WY: Append to synthesis buffer
388
+
Note over WY: Accumulate text and detect<br/>complete sentences using pysbd
389
+
alt Complete sentences detected
390
+
loop For each complete sentence
391
+
WY->>OAPI: Speech synthesis request
392
+
loop While receiving audio data
393
+
OAPI-->>WY: Audio stream chunks
394
+
WY-->>HA: AudioChunk events (incremental)
395
+
end
396
+
end
397
+
end
383
398
end
384
399
HA->>WY: SynthesizeStop event
385
-
Note over WY: No-op — OpenAI `/v1/audio/speech`<br/>does not support streaming text input
400
+
Note over WY: Process any remaining text<br/>and finalize synthesis
401
+
WY->>HA: AudioStop event
386
402
WY->>HA: SynthesizeStopped event
387
-
Note over WY: Streaming flow is handled<br/>but not advertised in capabilities
0 commit comments