🎤 Production-ready TTS API with voice cloning in one Docker command
High-quality text-to-speech with voice cloning, emotion control, and batch processing.
Run it now:
docker run -p 8000:8000 tsavo/chatterbox-tts-api
That's it! API is now running at http://localhost:8000
Test it:
curl -X POST http://localhost:8000/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello world!"}' \
--output hello.wav
- 🎭 Voice Cloning - Clone any voice from audio samples
- 🎛️ Emotion Control - Adjust intensity and expression
- 🔧 Multiple Formats - WAV, MP3, OGG output
- 🚀 Batch Processing - Handle multiple requests efficiently
- 📊 Job Tracking - Monitor processing status
- 🧩 Smart Chunking - Automatically handles long texts (40+ seconds)
- 🐳 Docker Ready - No setup required
Basic TTS:
import requests
response = requests.post("http://localhost:8000/tts", json={
"text": "Hello, this is a test!",
"output_format": "mp3"
})
with open("output.mp3", "wb") as f:
f.write(response.content)
Voice Cloning:
with open("reference_voice.wav", "rb") as audio_file:
response = requests.post(
"http://localhost:8000/voice-clone",
data={"text": "Clone this voice!"},
files={"audio_file": audio_file}
)
Batch Processing:
response = requests.post("http://localhost:8000/batch-tts", json={
"texts": ["First sentence", "Second sentence", "Third sentence"]
})
More examples: examples/ | Interactive docs: http://localhost:8000/docs
The API automatically handles long texts that would exceed the 40-second TTS limit:
How it works:
- Estimates duration from text length
- Intelligently splits on natural boundaries:
- Paragraph breaks (double line breaks)
- Sentence endings (periods, !, ?)
- Clause breaks (commas, semicolons, colons)
- Word boundaries (last resort)
- Generates each chunk separately
- Concatenates with ffmpeg into seamless audio
Example with long text:
long_text = """
Very long article or document content here...
Multiple paragraphs with natural breaks...
The system will automatically chunk this.
"""
# Will automatically chunk, generate, and concatenate
response = requests.post("http://localhost:8000/tts", json={
"text": long_text,
"output_format": "mp3"
})
# Returns single audio file with complete text
Parameter | Description | Default |
---|---|---|
exaggeration |
Emotional intensity (0.0-2.0) | 0.5 |
cfg_weight |
Generation guidance (0.0-1.0) | 0.5 |
temperature |
Randomness (0.1-2.0) | 1.0 |
output_format |
Audio format (wav, mp3, ogg) | wav |
With GPU support:
docker run --gpus all -p 8000:8000 tsavo/chatterbox-tts-api
Test the chunking feature:
# Test with long text (will automatically chunk and concatenate)
python test_chunking.py
Development/Custom builds:
git clone https://github.com/TSavo/chatterbox-tts-api.git
cd chatterbox-tts-api
docker-compose up
System Requirements:
- Docker (includes ffmpeg for audio concatenation)
- 4GB+ RAM (8GB recommended)
- GPU optional but recommended
- 📖 Interactive API Docs - Try the API in your browser
- 🐛 Issues - Bug reports and feature requests
- 💬 Discussions - Community help
MIT License - see LICENSE for details.