roryeckel
diff --git a/‎.github/copilot-instructions.md‎
Lines changed: 103 additions & 0 deletions b/‎.github/copilot-instructions.md‎
Lines changed: 103 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 5 additions & 5 deletions b/‎README.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎src/wyoming_openai/__main__.py‎
Lines changed: 0 additions & 1 deletion b/‎src/wyoming_openai/__main__.py‎
Lines changed: 0 additions & 1 deletion
@@ -0,0 +1,103 @@
+# GitHub Copilot Instructions
+
+## Project Context
+
+Wyoming OpenAI is a proxy middleware that bridges the Wyoming protocol with OpenAI-compatible endpoints for ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) services. It enables Wyoming clients like Home Assistant to use various OpenAI-compatible STT/TTS services.
+
+## Code Style and Conventions
+
+- Use async/await patterns for all I/O operations
+- Follow Python type hints for function signatures
+- Maintain consistency with existing error handling patterns
+- Use logging for debugging and error messages
+- Keep functions focused and modular
+
+## Architecture Overview
+
+### Core Components
+
+- **`handler.py`**: Contains `OpenAIEventHandler` - the main Wyoming protocol event handler that processes ASR and TTS requests
+- **`compatibility.py`**: Provides `CustomAsyncOpenAI` class with backend detection and OpenAI API compatibility layer
+- **`__main__.py`**: Entry point with argument parsing and server initialization
+- **`utilities.py`**: Helper functions for audio processing and data handling
+- **`const.py`**: Version constants and configuration
+
+### Key Patterns
+
+1. **Async Event Handling**: Uses Wyoming's `AsyncEventHandler` to process incoming protocol events
+2. **Backend Abstraction**: `CustomAsyncOpenAI` wraps different backends (OpenAI, Speaches, LocalAI, etc.) with a unified interface
+3. **Stream Processing**: Handles both streaming and non-streaming transcription modes
+4. **Audio Buffer Management**: Collects audio chunks into complete files for processing
+
+### Wyoming Protocol Events
+
+The handler processes these Wyoming events:
+- `AudioStart/AudioChunk/AudioStop` → STT transcription
+- `Transcribe` → Initiate transcription request  
+- `Synthesize` → TTS audio generation
+
+### Supported Backends
+
+The `OpenAIBackend` enum defines supported backends:
+- `OPENAI`: Official OpenAI API
+- `SPEACHES`: Local Speaches service
+- `LOCALAI`: LocalAI service
+- `KOKORO_FASTAPI`: Kokoro TTS service
+
+## Testing Guidelines
+
+When writing tests:
+- Use pytest fixtures for common setup
+- Mock external API calls
+- Test both success and error scenarios
+- Include integration tests for end-to-end flows
+- Aim for high code coverage
+
+Test files are organized by module:
+- `test_handler.py`: Event handler logic
+- `test_compatibility.py`: Backend compatibility
+- `test_utilities.py`: Helper functions
+- `test_integration.py`: End-to-end scenarios
+
+## Common Development Tasks
+
+### Running Tests
+```bash
+pytest                              # Run all tests
+pytest --cov=wyoming_openai        # With coverage
+pytest tests/test_handler.py       # Specific test file
+```
+
+### Code Quality
+```bash
+ruff check .                       # Run linting
+ruff check . --fix                 # Auto-fix issues
+```
+
+### Local Development
+```bash
+pip install -e ".[dev]"            # Install dev dependencies
+python -m wyoming_openai --uri tcp://0.0.0.0:10300 --stt-models whisper-1 --tts-models tts-1
+```
+
+### Docker Development
+```bash
+docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d --build
+```
+
+## Configuration
+
+The server accepts both command-line arguments and environment variables. When suggesting configuration changes, consider:
+- STT/TTS API keys and URLs
+- Model lists for STT and TTS
+- Voice configurations
+- Backend-specific settings (temperature, speed, etc.)
+
+## When Making Changes
+
+- Ensure backward compatibility with existing Wyoming clients
+- Update tests to reflect new functionality
+- Add appropriate logging for debugging
+- Document new configuration options
+- Consider impact on all supported backends
+- Validate audio format conversions maintain quality
@@ -10,7 +10,7 @@ Note: This project is not affiliated with OpenAI or the Wyoming project.
 
 ## Overview
 
-This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant. The proxy now provides incremental TTS streaming compatibility by intelligently chunking text at sentence boundaries for responsive audio delivery.
+This project introduces a [Wyoming](https://github.com/OHF-Voice/wyoming) server that connects to OpenAI-compatible endpoints of your choice. Like a proxy, it enables Wyoming clients such as the [Home Assistant Wyoming Integration](https://www.home-assistant.io/integrations/wyoming/) to use the transcription (Automatic Speech Recognition - ASR) and text-to-speech synthesis (TTS) capabilities of various OpenAI-compatible projects. By acting as a bridge between the Wyoming protocol and OpenAI, you can consolidate the resource usage on your server and extend the capabilities of Home Assistant. The proxy now provides incremental TTS streaming compatibility by intelligently chunking text at sentence boundaries with [pySBD](https://github.com/nipunsadvilkar/pySBD) for responsive audio delivery. When streaming is enabled, Wyoming OpenAI prefetches up to three OpenAI synthesis requests in parallel while playing the audio sequentially, keeping latency low without breaking event order.
 
 ## Featured Models
 
@@ -28,7 +28,7 @@ This project features a variety of examples for using cutting-edge models in bot
 2. **Service Consolidation**: Allow users of various programs to run inference on a single server without needing separate instances for each service.
 Example: Sharing TTS/STT services between [Open WebUI](#open-webui) and [Home Assistant](#usage-in-home-assistant).
 3. **Asynchronous Processing**: Enable efficient handling of multiple requests by supporting asynchronous processing of audio streams.
-4. **Streaming Compatibility**: Bridge Wyoming's streaming TTS protocol with OpenAI-compatible APIs through intelligent sentence boundary chunking, enabling responsive incremental audio delivery even when the underlying API doesn't support streaming text input.
+4. **Streaming Compatibility**: Bridge Wyoming's streaming TTS protocol with OpenAI-compatible APIs through intelligent sentence boundary chunking powered by [pySBD](https://github.com/nipunsadvilkar/pySBD), enabling responsive incremental audio delivery even when the underlying API doesn't support streaming text input. Concurrent pipelining (default limit of three in-flight requests) keeps playback smooth while ensuring events remain ordered.
 5. **Simple Setup with Docker**: Provide a straightforward deployment process using [Docker and Docker Compose](#docker-recommended) for OpenAI and various popular open source projects.
 
 ## Terminology
@@ -144,7 +144,7 @@ In addition to using command-line arguments, you can configure the Wyoming OpenA
 | `--tts-backend`                         | `TTS_BACKEND`                              | None (autodetected)                           | Enable unofficial API feature sets.          |
 | `--tts-speed`                           | `TTS_SPEED`                                | None (autodetected)                           | Speed of the TTS output (ranges from 0.25 to 4.0).               |
 | `--tts-instructions`                    | `TTS_INSTRUCTIONS`                         | None                                          | Optional instructions for TTS requests (Control the voice).    |
-| `--tts-streaming-models`                | `TTS_STREAMING_MODELS`                     | None                                          | Space-separated list of TTS models to enable incremental streaming via pysbd text chunking (e.g. `tts-1`). |
+| `--tts-streaming-models`                | `TTS_STREAMING_MODELS`                     | None                                          | Space-separated list of TTS models to enable incremental streaming via [pySBD](https://github.com/nipunsadvilkar/pySBD) sentence chunking that powers the TTS streaming pipeline (e.g. `tts-1`) with up to three concurrent synthesis requests. |
 | `--tts-streaming-min-words`             | `TTS_STREAMING_MIN_WORDS`                  | None                                          | Minimum words per text chunk for incremental TTS streaming (optional). |
 | `--tts-streaming-max-chars`             | `TTS_STREAMING_MAX_CHARS`                  | None                                          | Maximum characters per text chunk for incremental TTS streaming (optional). |
 
@@ -402,11 +402,11 @@ sequenceDiagram
     WY->>HA: AudioStop event
   else Streaming TTS (SynthesizeStart/Chunk/Stop)
     HA->>WY: SynthesizeStart event (voice config)
-    Note over WY: Initialize incremental synthesis<br/>with sentence boundary detection
+    Note over WY: Initialize incremental synthesis<br/>with pySBD-powered sentence boundary detection<br/>and up to three concurrent OpenAI TTS requests
     WY->>HA: AudioStart event
     loop Sending text chunks
       HA->>WY: SynthesizeChunk events
-      Note over WY: Accumulate text and detect<br/>complete sentences using pysbd
+      Note over WY: Accumulate text and detect<br/>complete sentences using pySBD sentence chunking<br/>while prefetching audio in parallel (max 3 concurrent requests)
       alt Complete sentences detected
         loop For each complete sentence
           WY->>OAPI: Speech synthesis request
 
@@ -269,7 +269,6 @@ async def main():
                 info=info,
                 stt_client=stt_client,
                 tts_client=tts_client,
-                client_lock=asyncio.Lock(),
                 stt_temperature=args.stt_temperature,
                 tts_speed=args.tts_speed,
                 tts_instructions=args.tts_instructions,