Skip to content

Commit 64aee2a

Browse files
committed
Merge remote-tracking branch 'roryeckel/main' into copilot/fix-28
2 parents 8b4dce9 + 1b8c150 commit 64aee2a

File tree

7 files changed

+624
-218
lines changed

7 files changed

+624
-218
lines changed

.github/copilot-instructions.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# GitHub Copilot Instructions
2+
3+
## Project Context
4+
5+
Wyoming OpenAI is a proxy middleware that bridges the Wyoming protocol with OpenAI-compatible endpoints for ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) services. It enables Wyoming clients like Home Assistant to use various OpenAI-compatible STT/TTS services.
6+
7+
## Code Style and Conventions
8+
9+
- Use async/await patterns for all I/O operations
10+
- Follow Python type hints for function signatures
11+
- Maintain consistency with existing error handling patterns
12+
- Use logging for debugging and error messages
13+
- Keep functions focused and modular
14+
15+
## Architecture Overview
16+
17+
### Core Components
18+
19+
- **`handler.py`**: Contains `OpenAIEventHandler` - the main Wyoming protocol event handler that processes ASR and TTS requests
20+
- **`compatibility.py`**: Provides `CustomAsyncOpenAI` class with backend detection and OpenAI API compatibility layer
21+
- **`__main__.py`**: Entry point with argument parsing and server initialization
22+
- **`utilities.py`**: Helper functions for audio processing and data handling
23+
- **`const.py`**: Version constants and configuration
24+
25+
### Key Patterns
26+
27+
1. **Async Event Handling**: Uses Wyoming's `AsyncEventHandler` to process incoming protocol events
28+
2. **Backend Abstraction**: `CustomAsyncOpenAI` wraps different backends (OpenAI, Speaches, LocalAI, etc.) with a unified interface
29+
3. **Stream Processing**: Handles both streaming and non-streaming transcription modes
30+
4. **Audio Buffer Management**: Collects audio chunks into complete files for processing
31+
32+
### Wyoming Protocol Events
33+
34+
The handler processes these Wyoming events:
35+
- `AudioStart/AudioChunk/AudioStop` → STT transcription
36+
- `Transcribe` → Initiate transcription request
37+
- `Synthesize` → TTS audio generation
38+
39+
### Supported Backends
40+
41+
The `OpenAIBackend` enum defines supported backends:
42+
- `OPENAI`: Official OpenAI API
43+
- `SPEACHES`: Local Speaches service
44+
- `LOCALAI`: LocalAI service
45+
- `KOKORO_FASTAPI`: Kokoro TTS service
46+
47+
## Testing Guidelines
48+
49+
When writing tests:
50+
- Use pytest fixtures for common setup
51+
- Mock external API calls
52+
- Test both success and error scenarios
53+
- Include integration tests for end-to-end flows
54+
- Aim for high code coverage
55+
56+
Test files are organized by module:
57+
- `test_handler.py`: Event handler logic
58+
- `test_compatibility.py`: Backend compatibility
59+
- `test_utilities.py`: Helper functions
60+
- `test_integration.py`: End-to-end scenarios
61+
62+
## Common Development Tasks
63+
64+
### Running Tests
65+
```bash
66+
pytest # Run all tests
67+
pytest --cov=wyoming_openai # With coverage
68+
pytest tests/test_handler.py # Specific test file
69+
```
70+
71+
### Code Quality
72+
```bash
73+
ruff check . # Run linting
74+
ruff check . --fix # Auto-fix issues
75+
```
76+
77+
### Local Development
78+
```bash
79+
pip install -e ".[dev]" # Install dev dependencies
80+
python -m wyoming_openai --uri tcp://0.0.0.0:10300 --stt-models whisper-1 --tts-models tts-1
81+
```
82+
83+
### Docker Development
84+
```bash
85+
docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d --build
86+
```
87+
88+
## Configuration
89+
90+
The server accepts both command-line arguments and environment variables. When suggesting configuration changes, consider:
91+
- STT/TTS API keys and URLs
92+
- Model lists for STT and TTS
93+
- Voice configurations
94+
- Backend-specific settings (temperature, speed, etc.)
95+
96+
## When Making Changes
97+
98+
- Ensure backward compatibility with existing Wyoming clients
99+
- Update tests to reflect new functionality
100+
- Add appropriate logging for debugging
101+
- Document new configuration options
102+
- Consider impact on all supported backends
103+
- Validate audio format conversions maintain quality

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Note: This project is not affiliated with OpenAI or the Wyoming project.
1010

1111
## Overview
1212

13-
This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant. The proxy now provides incremental TTS streaming compatibility by intelligently chunking text at sentence boundaries for responsive audio delivery.
13+
This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant. The proxy now provides incremental TTS streaming compatibility by intelligently chunking text at sentence boundaries with [pySBD](https://github.com/nipunsadvilkar/pySBD) for responsive audio delivery. When streaming is enabled, Wyoming OpenAI prefetches up to three OpenAI synthesis requests in parallel while playing the audio sequentially, keeping latency low without breaking event order.
1414

1515
## Featured Models
1616

@@ -28,7 +28,7 @@ This project features a variety of examples for using cutting-edge models in bot
2828
2. **Service Consolidation**: Allow users of various programs to run inference on a single server without needing separate instances for each service.
2929
Example: Sharing TTS/STT services between [Open WebUI](#open-webui) and [Home Assistant](#usage-in-home-assistant).
3030
3. **Asynchronous Processing**: Enable efficient handling of multiple requests by supporting asynchronous processing of audio streams.
31-
4. **Streaming Compatibility**: Bridge Wyoming's streaming TTS protocol with OpenAI-compatible APIs through intelligent sentence boundary chunking, enabling responsive incremental audio delivery even when the underlying API doesn't support streaming text input.
31+
4. **Streaming Compatibility**: Bridge Wyoming's streaming TTS protocol with OpenAI-compatible APIs through intelligent sentence boundary chunking powered by [pySBD](https://github.com/nipunsadvilkar/pySBD), enabling responsive incremental audio delivery even when the underlying API doesn't support streaming text input. Concurrent pipelining (default limit of three in-flight requests) keeps playback smooth while ensuring events remain ordered.
3232
5. **Simple Setup with Docker**: Provide a straightforward deployment process using [Docker and Docker Compose](#docker-recommended) for OpenAI and various popular open source projects.
3333

3434
## Terminology
@@ -144,7 +144,7 @@ In addition to using command-line arguments, you can configure the Wyoming OpenA
144144
| `--tts-backend` | `TTS_BACKEND` | None (autodetected) | Enable unofficial API feature sets. |
145145
| `--tts-speed` | `TTS_SPEED` | None (autodetected) | Speed of the TTS output (ranges from 0.25 to 4.0). |
146146
| `--tts-instructions` | `TTS_INSTRUCTIONS` | None | Optional instructions for TTS requests (Control the voice). |
147-
| `--tts-streaming-models` | `TTS_STREAMING_MODELS` | None | Space-separated list of TTS models to enable incremental streaming via pysbd text chunking (e.g. `tts-1`). |
147+
| `--tts-streaming-models` | `TTS_STREAMING_MODELS` | None | Space-separated list of TTS models to enable incremental streaming via [pySBD](https://github.com/nipunsadvilkar/pySBD) sentence chunking that powers the TTS streaming pipeline (e.g. `tts-1`) with up to three concurrent synthesis requests. |
148148
| `--tts-streaming-min-words` | `TTS_STREAMING_MIN_WORDS` | None | Minimum words per text chunk for incremental TTS streaming (optional). |
149149
| `--tts-streaming-max-chars` | `TTS_STREAMING_MAX_CHARS` | None | Maximum characters per text chunk for incremental TTS streaming (optional). |
150150
@@ -402,11 +402,11 @@ sequenceDiagram
402402
WY->>HA: AudioStop event
403403
else Streaming TTS (SynthesizeStart/Chunk/Stop)
404404
HA->>WY: SynthesizeStart event (voice config)
405-
Note over WY: Initialize incremental synthesis<br/>with sentence boundary detection
405+
Note over WY: Initialize incremental synthesis<br/>with pySBD-powered sentence boundary detection<br/>and up to three concurrent OpenAI TTS requests
406406
WY->>HA: AudioStart event
407407
loop Sending text chunks
408408
HA->>WY: SynthesizeChunk events
409-
Note over WY: Accumulate text and detect<br/>complete sentences using pysbd
409+
Note over WY: Accumulate text and detect<br/>complete sentences using pySBD sentence chunking<br/>while prefetching audio in parallel (max 3 concurrent requests)
410410
alt Complete sentences detected
411411
loop For each complete sentence
412412
WY->>OAPI: Speech synthesis request

src/wyoming_openai/__main__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,6 @@ async def main():
269269
info=info,
270270
stt_client=stt_client,
271271
tts_client=tts_client,
272-
client_lock=asyncio.Lock(),
273272
stt_temperature=args.stt_temperature,
274273
tts_speed=args.tts_speed,
275274
tts_instructions=args.tts_instructions,

0 commit comments

Comments
 (0)