Incremental TTS streaming release (0.3.7)
This release introduces a major enhancement: incremental TTS streaming that provides responsive audio delivery by intelligently chunking text at sentence boundaries, significantly improving response speed.
New Features
- Incremental TTS Streaming: Added intelligent sentence-boundary text chunking using pysbd for immediate audio delivery as text arrives
- Streaming Configuration: New CLI options --tts-streaming-models, --tts-streaming-min-words, and --tts-streaming-max-chars for fine-tuned control
- PR Docker Images: Automatic Docker image builds for pull requests with pr-{number} tagging for testing proposed changes
- Automatic Cleanup: PR Docker images are automatically removed when pull requests are closed/merged
- Thanks to #22 for the suggestion
Enhancements
- Enhanced Logging: Separate visibility into streaming vs non-streaming TTS voices in startup logs
- Language Support: Intelligent language detection for sentence segmentation with fallback to English
- Streaming Protocol: Full Wyoming streaming TTS protocol support with proper SynthesizeStart/Chunk/Stop event handling
Improvements
- Dependencies: Updated OpenAI library to v1.107.0, added pysbd v0.3.4 for sentence segmentation
- Documentation: Comprehensive README updates with streaming workflow diagrams and configuration examples
Technical Details
- Implements real-time sentence detection and immediate TTS synthesis for each complete sentence
- Maintains audio timing continuity across incremental chunks
- Backwards compatible - existing configurations continue to work unchanged
- Streaming models advertise supports_synthesize_streaming=True capability
Full Changelog: v0.3.6...v0.3.7