Disable Speaches VAD by default and bump version to 0.3.4

roryeckel · roryeckel · commit 9cd1579c5925 · 2025-07-15T19:09:14.000-05:00
Updated the handler to set 'vad_filter=False' for the Speaches backend, disabling VAD as it is not yet compatible with the Wyoming protocol. Also updated documentation to note this behavior and incremented the project version to 0.3.4.
diff --git a/README.md b/README.md
@@ -173,6 +173,7 @@ If you prefer using a local service like Speaches instead of official OpenAI ser
   - The Speaches container is configured with specific model settings (`Systran/faster-distil-whisper-large-v3` for STT and `speaches-ai/Kokoro-82M-v1.0-ONNX` for TTS).
   - It uses a local port (8000) to expose the Speaches service.
   - NVIDIA GPU support is enabled, so ensure your system has an appropriate setup if you plan to utilize GPU resources.
+  - Note: wyoming_openai disables Speaches VAD (Voice Activity Detection) by default, as it is not yet compatible with the Wyoming protocol.
 
 - **Command**:
   
@@ -228,7 +229,7 @@ We follow specific tagging conventions for our Docker images. These tags help in
 
 - **`main`**: This tag points to the latest commit on the main code branch. It is suitable for users who want to experiment with the most up-to-date features and changes, but may include unstable or experimental code.
 
-- **`major.minor.patch version`**: Specific version tags (e.g., `0.3.3`) correspond to specific stable releases of the Wyoming OpenAI proxy server. These tags are ideal for users who need a consistent, reproducible environment and want to avoid breaking changes introduced in newer versions.
+- **`major.minor.patch version`**: Specific version tags (e.g., `0.3.4`) correspond to specific stable releases of the Wyoming OpenAI proxy server. These tags are ideal for users who need a consistent, reproducible environment and want to avoid breaking changes introduced in newer versions.
 
 - **`major.minor version`**: Tags that follow the `major.minor` format (e.g., `0.3`) represent a range of patch-level updates within the same minor version series. These tags are useful for users who want to stay updated with bug fixes and minor improvements without upgrading to a new major or minor version.
 
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "wyoming_openai"
-version = "0.3.3"
+version = "0.3.4"
 description = "OpenAI-Compatible Proxy Middleware for the Wyoming Protocol"
 authors = [
     { name = "Rory Eckel" }
diff --git a/src/wyoming_openai/handler.py b/src/wyoming_openai/handler.py
@@ -17,7 +17,7 @@
 from wyoming.server import AsyncEventHandler
 from wyoming.tts import Synthesize
 
-from .compatibility import CustomAsyncOpenAI, TtsVoiceModel
+from .compatibility import CustomAsyncOpenAI, OpenAIBackend, TtsVoiceModel
 from .utilities import NamedBytesIO
 
 _LOGGER = logging.getLogger(__name__)
@@ -169,13 +169,20 @@ async def _handle_audio_stop(self) -> None:
             async with self._client_lock:
                 use_streaming = self._is_asr_model_streaming(self._current_asr_model.name)
 
+                # Prepare extra_body for SPEACHES backend
+                extra_body = {}
+                if hasattr(self._stt_client, 'backend') and self._stt_client.backend == OpenAIBackend.SPEACHES:
+                    extra_body["vad_filter"] = False
+                    _LOGGER.debug("Adding vad_filter=False for SPEACHES backend")
+
                 transcription = await self._stt_client.audio.transcriptions.create(
                     file=self._wav_buffer,
                     model=self._current_asr_model.name,
                     temperature=self._stt_temperature or NOT_GIVEN,
                     prompt=self._stt_prompt or NOT_GIVEN,
                     response_format="json",
-                    stream=use_streaming
+                    stream=use_streaming,
+                    extra_body=extra_body if extra_body else None
                 )
 
                 await self.write_event(TranscriptStart().event())