|
1 | 1 | # Chatterbox TTS API
|
2 | 2 |
|
3 |
| -[](https://www.python.org/downloads/) |
4 |
| -[](https://fastapi.tiangolo.com/) |
5 | 3 | [](https://www.docker.com/)
|
6 | 4 | [](LICENSE)
|
| 5 | +[](https://github.com/TSavo/chatterbox-tts-api/stargazers) |
| 6 | +[](https://hub.docker.com/r/tsavo/chatterbox-tts-api) |
7 | 7 |
|
8 |
| -A high-performance, production-ready Text-to-Speech (TTS) API service built with FastAPI and powered by Chatterbox TTS. Features advanced voice cloning, emotion control, and batch processing capabilities. |
| 8 | +> 🎤 **Production-ready TTS API with voice cloning in one Docker command** |
9 | 9 |
|
10 |
| -## ✨ Features |
11 |
| - |
12 |
| -- 🎯 **Advanced TTS Generation**: High-quality text-to-speech with emotion control |
13 |
| -- 🎭 **Voice Cloning**: Clone any voice from a reference audio sample |
14 |
| -- 🚀 **Batch Processing**: Process multiple texts simultaneously for efficiency |
15 |
| -- 🎛️ **Fine-grained Control**: Adjust exaggeration, guidance weight, and temperature |
16 |
| -- 🔧 **Multiple Output Formats**: Support for WAV, MP3, and OGG formats |
17 |
| -- 📦 **Base64 Encoding**: Optional base64 output for web applications |
18 |
| -- 🐳 **Docker Ready**: Easy deployment with Docker and Docker Compose |
19 |
| -- 🚀 **GPU Accelerated**: Automatic GPU detection and utilization |
20 |
| -- 📊 **Health Monitoring**: Built-in health checks and status endpoints |
21 |
| -- 🔒 **Production Ready**: Comprehensive error handling and logging |
| 10 | +High-quality text-to-speech with voice cloning, emotion control, and batch processing. |
22 | 11 |
|
23 | 12 | ## 🚀 Quick Start
|
24 | 13 |
|
25 |
| -### Option 1: Docker (Recommended) |
26 |
| - |
27 |
| -1. **Clone the repository** |
28 |
| - ```bash |
29 |
| - git clone https://github.com/TSavo/chatterbox-tts-api.git |
30 |
| - cd chatterbox-tts-api |
31 |
| - ``` |
32 |
| - |
33 |
| -2. **Run with Docker Compose** |
34 |
| - ```bash |
35 |
| - # For CPU-only deployment |
36 |
| - docker-compose up -d |
37 |
| - |
38 |
| - # For GPU-accelerated deployment |
39 |
| - docker-compose -f docker-compose.gpu.yml up -d |
40 |
| - ``` |
41 |
| - |
42 |
| -3. **Access the API** |
43 |
| - - API: http://localhost:8000 |
44 |
| - - Interactive docs: http://localhost:8000/docs |
45 |
| - - Health check: http://localhost:8000/health |
46 |
| - |
47 |
| -### Option 2: Local Installation |
48 |
| - |
49 |
| -1. **Install Python 3.12+** |
50 |
| - |
51 |
| -2. **Clone and setup** |
52 |
| - ```bash |
53 |
| - git clone https://github.com/TSavo/chatterbox-tts-api.git |
54 |
| - cd chatterbox-tts-api |
55 |
| - |
56 |
| - # For Windows (PowerShell) |
57 |
| - .\setup-local.ps1 |
58 |
| - |
59 |
| - # For Unix/Linux/macOS |
60 |
| - pip install -r requirements.txt |
61 |
| - # Install PyTorch with CUDA support (optional) |
62 |
| - pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121 |
63 |
| - ``` |
64 |
| - |
65 |
| -3. **Run the application** |
66 |
| - ```bash |
67 |
| - python -m uvicorn app:app --host 0.0.0.0 --port 8000 |
68 |
| - ``` |
69 |
| - |
70 |
| -## 📖 API Usage |
71 |
| - |
72 |
| -### Basic TTS Generation |
| 14 | +**Run it now:** |
| 15 | +```bash |
| 16 | +docker run -p 8000:8000 tsavo/chatterbox-tts-api |
| 17 | +``` |
| 18 | + |
| 19 | +That's it! API is now running at http://localhost:8000 |
| 20 | + |
| 21 | +**Test it:** |
| 22 | +```bash |
| 23 | +curl -X POST http://localhost:8000/tts \ |
| 24 | + -H "Content-Type: application/json" \ |
| 25 | + -d '{"text": "Hello world!"}' \ |
| 26 | + --output hello.wav |
| 27 | +``` |
| 28 | + |
| 29 | +## ✨ Features |
| 30 | + |
| 31 | +- 🎭 **Voice Cloning** - Clone any voice from audio samples |
| 32 | +- 🎛️ **Emotion Control** - Adjust intensity and expression |
| 33 | +- 🔧 **Multiple Formats** - WAV, MP3, OGG output |
| 34 | +- 🚀 **Batch Processing** - Handle multiple requests efficiently |
| 35 | +- 📊 **Job Tracking** - Monitor processing status |
| 36 | +- 🧩 **Smart Chunking** - Automatically handles long texts (40+ seconds) |
| 37 | +- 🐳 **Docker Ready** - No setup required |
| 38 | + |
| 39 | +## 📖 Usage Examples |
73 | 40 |
|
| 41 | +**Basic TTS:** |
74 | 42 | ```python
|
75 | 43 | import requests
|
76 | 44 |
|
77 |
| -# Simple text-to-speech |
78 | 45 | response = requests.post("http://localhost:8000/tts", json={
|
79 |
| - "text": "Hello, world! This is a test of the Chatterbox TTS system.", |
80 |
| - "exaggeration": 0.7, |
81 |
| - "cfg_weight": 0.6, |
82 |
| - "temperature": 1.0, |
83 |
| - "return_base64": False |
| 46 | + "text": "Hello, this is a test!", |
| 47 | + "output_format": "mp3" |
84 | 48 | })
|
85 | 49 |
|
86 |
| -# Save the audio |
87 |
| -with open("output.wav", "wb") as f: |
| 50 | +with open("output.mp3", "wb") as f: |
88 | 51 | f.write(response.content)
|
89 | 52 | ```
|
90 | 53 |
|
91 |
| -### Voice Cloning |
92 |
| - |
| 54 | +**Voice Cloning:** |
93 | 55 | ```python
|
94 |
| -import requests |
95 |
| - |
96 |
| -# Clone a voice from reference audio |
97 | 56 | with open("reference_voice.wav", "rb") as audio_file:
|
98 | 57 | response = requests.post(
|
99 | 58 | "http://localhost:8000/voice-clone",
|
100 |
| - data={ |
101 |
| - "text": "This will sound like the reference voice!", |
102 |
| - "exaggeration": 0.5, |
103 |
| - "return_base64": True |
104 |
| - }, |
| 59 | + data={"text": "Clone this voice!"}, |
105 | 60 | files={"audio_file": audio_file}
|
106 | 61 | )
|
107 |
| - |
108 |
| -result = response.json() |
109 |
| -# result["audio_base64"] contains the generated audio |
110 | 62 | ```
|
111 | 63 |
|
112 |
| -### Batch Processing |
113 |
| - |
| 64 | +**Batch Processing:** |
114 | 65 | ```python
|
115 |
| -import requests |
116 |
| - |
117 |
| -# Process multiple texts at once |
118 | 66 | response = requests.post("http://localhost:8000/batch-tts", json={
|
119 |
| - "texts": [ |
120 |
| - "First sentence to convert.", |
121 |
| - "Second sentence to convert.", |
122 |
| - "Third sentence to convert." |
123 |
| - ], |
124 |
| - "exaggeration": 0.6, |
125 |
| - "cfg_weight": 0.5 |
| 67 | + "texts": ["First sentence", "Second sentence", "Third sentence"] |
126 | 68 | })
|
127 |
| - |
128 |
| -results = response.json() |
129 |
| -for i, result in enumerate(results["results"]): |
130 |
| - if result["success"]: |
131 |
| - print(f"Text {i+1}: Generated {result['duration_seconds']:.2f}s of audio") |
132 | 69 | ```
|
133 | 70 |
|
134 |
| -## 🎛️ Configuration Parameters |
135 |
| - |
136 |
| -| Parameter | Description | Range | Default | |
137 |
| -|-----------|-------------|-------|---------| |
138 |
| -| `exaggeration` | Controls emotional intensity and expression | 0.0 - 2.0 | 0.5 | |
139 |
| -| `cfg_weight` | Controls generation guidance and pacing | 0.0 - 1.0 | 0.5 | |
140 |
| -| `temperature` | Controls randomness in generation | 0.1 - 2.0 | 1.0 | |
141 |
| -| `output_format` | Audio output format | wav, mp3, ogg | wav | |
142 |
| -| `return_base64` | Return audio as base64 string | boolean | false | |
143 |
| - |
144 |
| -## 📖 Examples and Usage |
| 71 | +More examples: [examples/](examples/) | Interactive docs: http://localhost:8000/docs |
145 | 72 |
|
146 |
| -Comprehensive examples are available for multiple programming languages: |
| 73 | +## 🧩 Smart Text Chunking |
147 | 74 |
|
148 |
| -### Quick Examples After `docker-compose up` |
149 |
| - |
150 |
| -- **Python**: See [examples/python/](examples/python/) for complete examples |
151 |
| -- **JavaScript/Node.js**: See [examples/javascript/](examples/javascript/) for both Node.js and browser examples |
152 |
| -- **cURL**: See [examples/curl/](examples/curl/) for command-line testing |
153 |
| -- **PHP**: See [examples/php/](examples/php/) for web integration examples |
154 |
| - |
155 |
| -### Quick Test |
156 |
| - |
157 |
| -**Option 1: Use the quickstart scripts** |
158 |
| -```bash |
159 |
| -# Linux/macOS |
160 |
| -./quickstart.sh |
161 |
| - |
162 |
| -# Windows PowerShell |
163 |
| -./quickstart.ps1 |
164 |
| -``` |
165 |
| - |
166 |
| -**Option 2: Manual testing** |
167 |
| -```bash |
168 |
| -# Test if API is running |
169 |
| -curl http://localhost:8000/health |
| 75 | +The API automatically handles long texts that would exceed the 40-second TTS limit: |
170 | 76 |
|
171 |
| -# Check queue status (NEW in v3.0) |
172 |
| -curl http://localhost:8000/queue/status |
| 77 | +**How it works:** |
| 78 | +1. **Estimates duration** from text length |
| 79 | +2. **Intelligently splits** on natural boundaries: |
| 80 | + - Paragraph breaks (double line breaks) |
| 81 | + - Sentence endings (periods, !, ?) |
| 82 | + - Clause breaks (commas, semicolons, colons) |
| 83 | + - Word boundaries (last resort) |
| 84 | +3. **Generates each chunk** separately |
| 85 | +4. **Concatenates with ffmpeg** into seamless audio |
173 | 86 |
|
174 |
| -# Generate speech with job tracking |
175 |
| -curl -X POST http://localhost:8000/tts \ |
176 |
| - -H "Content-Type: application/json" \ |
177 |
| - -d '{"text": "Hello world!", "exaggeration": 0.7, "output_format": "mp3"}' \ |
178 |
| - --output "hello.mp3" |
| 87 | +**Example with long text:** |
| 88 | +```python |
| 89 | +long_text = """ |
| 90 | +Very long article or document content here... |
| 91 | +Multiple paragraphs with natural breaks... |
| 92 | +The system will automatically chunk this. |
| 93 | +""" |
179 | 94 |
|
180 |
| -# Generate with base64 response to get job ID |
181 |
| -curl -X POST http://localhost:8000/tts \ |
182 |
| - -H "Content-Type: application/json" \ |
183 |
| - -d '{"text": "Hello world!", "return_base64": true}' \ |
184 |
| - | jq '.job_id' |
| 95 | +# Will automatically chunk, generate, and concatenate |
| 96 | +response = requests.post("http://localhost:8000/tts", json={ |
| 97 | + "text": long_text, |
| 98 | + "output_format": "mp3" |
| 99 | +}) |
| 100 | +# Returns single audio file with complete text |
185 | 101 | ```
|
186 | 102 |
|
187 |
| -## 🔧 API Endpoints |
188 |
| - |
189 |
| -### Core Endpoints |
| 103 | +## 🎛️ Parameters |
190 | 104 |
|
191 |
| -- `POST /tts` - Generate speech from text with advanced controls |
192 |
| -- `POST /voice-clone` - Generate speech with voice cloning |
193 |
| -- `POST /batch-tts` - Process multiple texts simultaneously |
| 105 | +| Parameter | Description | Default | |
| 106 | +|-----------|-------------|---------| |
| 107 | +| `exaggeration` | Emotional intensity (0.0-2.0) | 0.5 | |
| 108 | +| `cfg_weight` | Generation guidance (0.0-1.0) | 0.5 | |
| 109 | +| `temperature` | Randomness (0.1-2.0) | 1.0 | |
| 110 | +| `output_format` | Audio format (wav, mp3, ogg) | wav | |
194 | 111 |
|
195 |
| -### Monitoring |
196 |
| - |
197 |
| -- `GET /` - Basic health check with queue information |
198 |
| -- `GET /health` - Detailed health check with model status |
199 |
| -- `GET /queue/status` - Get current queue status |
200 |
| -- `GET /docs` - Interactive API documentation |
201 |
| - |
202 |
| -## 🐳 Docker Configuration |
203 |
| - |
204 |
| -### Environment Variables |
| 112 | +## 🔧 Advanced Setup |
205 | 113 |
|
| 114 | +**With GPU support:** |
206 | 115 | ```bash
|
207 |
| -# GPU Support (optional) |
208 |
| -NVIDIA_VISIBLE_DEVICES=all |
209 |
| -NVIDIA_DRIVER_CAPABILITIES=compute,utility |
210 |
| - |
211 |
| -# Model caching (optional) |
212 |
| -HF_HOME=/app/hf_cache |
| 116 | +docker run --gpus all -p 8000:8000 tsavo/chatterbox-tts-api |
213 | 117 | ```
|
214 | 118 |
|
215 |
| -### Volume Mounts |
216 |
| - |
217 |
| -- `./hf_cache:/root/.cache/huggingface` - Cache model downloads |
218 |
| - |
219 |
| -## 🔧 Development |
220 |
| - |
221 |
| -### Project Structure |
222 |
| - |
223 |
| -``` |
224 |
| -chatterbox-tts-api/ |
225 |
| -├── app.py # Main FastAPI application |
226 |
| -├── requirements.txt # Python dependencies |
227 |
| -├── requirements-docker.txt # Docker-specific dependencies |
228 |
| -├── Dockerfile # Optimized Docker image configuration |
229 |
| -├── docker-compose.yml # Docker deployment |
230 |
| -├── .github/workflows/ # CI/CD pipeline |
231 |
| -├── tests/ # Test suite |
232 |
| -├── examples/ # Usage examples |
233 |
| -├── test_mp3_sync.py # Synchronous MP3 test script |
234 |
| -└── .gitignore # Git ignore rules |
| 119 | +**Test the chunking feature:** |
| 120 | +```bash |
| 121 | +# Test with long text (will automatically chunk and concatenate) |
| 122 | +python test_chunking.py |
235 | 123 | ```
|
236 | 124 |
|
237 |
| -### Running Tests |
238 |
| - |
| 125 | +**Development/Custom builds:** |
239 | 126 | ```bash
|
240 |
| -# Run the test suite |
241 |
| -python -m pytest tests/ |
242 |
| - |
243 |
| -# Run specific tests |
244 |
| -python chatterbox_test.py |
245 |
| -python test_gpu.py |
| 127 | +git clone https://github.com/TSavo/chatterbox-tts-api.git |
| 128 | +cd chatterbox-tts-api |
| 129 | +docker-compose up |
246 | 130 | ```
|
247 | 131 |
|
248 |
| -### Contributing |
249 |
| - |
250 |
| -We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. |
251 |
| - |
252 |
| -## 📋 Requirements |
253 |
| - |
254 |
| -### System Requirements |
255 |
| - |
256 |
| -- **Python**: 3.12 or higher |
257 |
| -- **Memory**: 4GB RAM minimum, 8GB recommended |
258 |
| -- **GPU**: NVIDIA GPU with CUDA support (optional but recommended) |
259 |
| -- **Storage**: 2GB for model cache |
| 132 | +**System Requirements:** |
| 133 | +- Docker (includes ffmpeg for audio concatenation) |
| 134 | +- 4GB+ RAM (8GB recommended) |
| 135 | +- GPU optional but recommended |
260 | 136 |
|
261 |
| -### Dependencies |
| 137 | +## 📞 Support |
262 | 138 |
|
263 |
| -- FastAPI and Uvicorn for the web framework |
264 |
| -- PyTorch and TorchAudio for audio processing |
265 |
| -- Chatterbox TTS for the core TTS functionality |
266 |
| - |
267 |
| -## 🚨 Known Issues & Limitations |
268 |
| - |
269 |
| -- Initial model loading may take 1-2 minutes on first run |
270 |
| -- Large batch requests may timeout on slower hardware |
271 |
| -- Some audio formats may require additional system codecs |
272 |
| -- GPU memory usage scales with batch size and audio length |
273 |
| - |
274 |
| -## 📞 Support & Community |
275 |
| - |
276 |
| -- 📖 **[API Documentation](http://localhost:8000/docs)** - Interactive API documentation |
277 |
| -- 🐛 **[Report Issues](https://github.com/TSavo/chatterbox-tts-api/issues)** - Bug reports and feature requests |
278 |
| -- 💬 **[GitHub Discussions](https://github.com/TSavo/chatterbox-tts-api/discussions)** - Community discussions |
279 |
| -- 📧 **[Contact Author](mailto:listentomy@nefariousplan.com)** - Direct support |
280 |
| - |
281 |
| -## 🙏 Acknowledgments |
282 |
| - |
283 |
| -- **[Chatterbox TTS](https://github.com/JarodMica/chatterbox)** - For the amazing TTS model |
284 |
| -- **[FastAPI](https://fastapi.tiangolo.com/)** - For the excellent web framework |
285 |
| -- **[PyTorch](https://pytorch.org/)** - For the deep learning foundation |
286 |
| -- **All contributors** - Thank you for making this project better! |
| 139 | +- 📖 **[Interactive API Docs](http://localhost:8000/docs)** - Try the API in your browser |
| 140 | +- 🐛 **[Issues](https://github.com/TSavo/chatterbox-tts-api/issues)** - Bug reports and feature requests |
| 141 | +- 💬 **[Discussions](https://github.com/TSavo/chatterbox-tts-api/discussions)** - Community help |
287 | 142 |
|
288 | 143 | ## 📜 License
|
289 | 144 |
|
290 |
| -This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
| 145 | +MIT License - see [LICENSE](LICENSE) for details. |
291 | 146 |
|
292 | 147 | ---
|
293 | 148 |
|
294 |
| -**Created with ❤️ by [T Savo](mailto:listentomy@nefariousplan.com)** |
295 |
| - |
296 |
| -🌐 **[Horizon City](https://www.horizon-city.com)** - *Ushering in the AI revolution and hastening the extinction of humans* |
297 |
| - |
298 |
| -*Making high-quality TTS accessible to every developer - one API call closer to human obsolescence* |
| 149 | +**[T Savo](mailto:listentomy@nefariousplan.com)** • **[Horizon City](https://www.horizon-city.com)** |
0 commit comments