Skip to content

Incremental TTS streaming release (0.3.7)

Choose a tag to compare

@roryeckel roryeckel released this 10 Sep 03:18
· 25 commits to main since this release
a42abac

This release introduces a major enhancement: incremental TTS streaming that provides responsive audio delivery by intelligently chunking text at sentence boundaries, significantly improving response speed.

New Features

  • Incremental TTS Streaming: Added intelligent sentence-boundary text chunking using pysbd for immediate audio delivery as text arrives
  • Streaming Configuration: New CLI options --tts-streaming-models, --tts-streaming-min-words, and --tts-streaming-max-chars for fine-tuned control
  • PR Docker Images: Automatic Docker image builds for pull requests with pr-{number} tagging for testing proposed changes
  • Automatic Cleanup: PR Docker images are automatically removed when pull requests are closed/merged
  • Thanks to #22 for the suggestion

Enhancements

  • Enhanced Logging: Separate visibility into streaming vs non-streaming TTS voices in startup logs
  • Language Support: Intelligent language detection for sentence segmentation with fallback to English
  • Streaming Protocol: Full Wyoming streaming TTS protocol support with proper SynthesizeStart/Chunk/Stop event handling

Improvements

  • Dependencies: Updated OpenAI library to v1.107.0, added pysbd v0.3.4 for sentence segmentation
  • Documentation: Comprehensive README updates with streaming workflow diagrams and configuration examples

Technical Details

  • Implements real-time sentence detection and immediate TTS synthesis for each complete sentence
  • Maintains audio timing continuity across incremental chunks
  • Backwards compatible - existing configurations continue to work unchanged
  • Streaming models advertise supports_synthesize_streaming=True capability

Full Changelog: v0.3.6...v0.3.7