Skip to content

High-performance TTS API with voice cloning, emotion control, and synchronous MP3 generation. Built with FastAPI and powered by Chatterbox TTS.

License

Notifications You must be signed in to change notification settings

TSavo/chatterbox-tts-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chatterbox TTS API

Docker License GitHub stars Docker Pulls

🎤 Production-ready TTS API with voice cloning in one Docker command

High-quality text-to-speech with voice cloning, emotion control, and batch processing.

🚀 Quick Start

Run it now:

docker run -p 8000:8000 tsavo/chatterbox-tts-api

That's it! API is now running at http://localhost:8000

Test it:

curl -X POST http://localhost:8000/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world!"}' \
  --output hello.wav

✨ Features

  • 🎭 Voice Cloning - Clone any voice from audio samples
  • 🎛️ Emotion Control - Adjust intensity and expression
  • 🔧 Multiple Formats - WAV, MP3, OGG output
  • 🚀 Batch Processing - Handle multiple requests efficiently
  • 📊 Job Tracking - Monitor processing status
  • 🧩 Smart Chunking - Automatically handles long texts (40+ seconds)
  • 🐳 Docker Ready - No setup required

📖 Usage Examples

Basic TTS:

import requests

response = requests.post("http://localhost:8000/tts", json={
    "text": "Hello, this is a test!",
    "output_format": "mp3"
})

with open("output.mp3", "wb") as f:
    f.write(response.content)

Voice Cloning:

with open("reference_voice.wav", "rb") as audio_file:
    response = requests.post(
        "http://localhost:8000/voice-clone",
        data={"text": "Clone this voice!"},
        files={"audio_file": audio_file}
    )

Batch Processing:

response = requests.post("http://localhost:8000/batch-tts", json={
    "texts": ["First sentence", "Second sentence", "Third sentence"]
})

More examples: examples/ | Interactive docs: http://localhost:8000/docs

🧩 Smart Text Chunking

The API automatically handles long texts that would exceed the 40-second TTS limit:

How it works:

  1. Estimates duration from text length
  2. Intelligently splits on natural boundaries:
    • Paragraph breaks (double line breaks)
    • Sentence endings (periods, !, ?)
    • Clause breaks (commas, semicolons, colons)
    • Word boundaries (last resort)
  3. Generates each chunk separately
  4. Concatenates with ffmpeg into seamless audio

Example with long text:

long_text = """
Very long article or document content here...
Multiple paragraphs with natural breaks...
The system will automatically chunk this.
"""

# Will automatically chunk, generate, and concatenate
response = requests.post("http://localhost:8000/tts", json={
    "text": long_text,
    "output_format": "mp3"
})
# Returns single audio file with complete text

🎛️ Parameters

Parameter Description Default
exaggeration Emotional intensity (0.0-2.0) 0.5
cfg_weight Generation guidance (0.0-1.0) 0.5
temperature Randomness (0.1-2.0) 1.0
output_format Audio format (wav, mp3, ogg) wav

🔧 Advanced Setup

With GPU support:

docker run --gpus all -p 8000:8000 tsavo/chatterbox-tts-api

Test the chunking feature:

# Test with long text (will automatically chunk and concatenate)
python test_chunking.py

Development/Custom builds:

git clone https://github.com/TSavo/chatterbox-tts-api.git
cd chatterbox-tts-api
docker-compose up

System Requirements:

  • Docker (includes ffmpeg for audio concatenation)
  • 4GB+ RAM (8GB recommended)
  • GPU optional but recommended

📞 Support

📜 License

MIT License - see LICENSE for details.


T SavoHorizon City