Skip to content

Commit 05c0e80

Browse files
author
T Savo
committed
feat: Add intelligent text chunking for long TTS requests
- Implement smart text chunking with 3-tier strategy: * Under 25s: No chunking (optimal quality) * 25-40s: Gentle chunking at natural boundaries * Over 40s: Aggressive but intelligent chunking - Respect natural language boundaries (paragraphs, sentences, clauses) - Seamless audio concatenation using ffmpeg - Voice cloning consistency across chunks - Comprehensive test suite for chunking logic - Production-ready error handling and cleanup This enables processing of arbitrarily long texts while maintaining high audio quality and natural speech flow.
1 parent 4a26d48 commit 05c0e80

File tree

8 files changed

+1924
-254
lines changed

8 files changed

+1924
-254
lines changed

README.md

Lines changed: 91 additions & 240 deletions
Original file line numberDiff line numberDiff line change
@@ -1,298 +1,149 @@
11
# Chatterbox TTS API
22

3-
[![Python](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
4-
[![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-green.svg)](https://fastapi.tiangolo.com/)
53
[![Docker](https://img.shields.io/badge/docker-%230db7ed.svg)](https://www.docker.com/)
64
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
5+
[![GitHub stars](https://img.shields.io/github/stars/TSavo/chatterbox-tts-api?style=social)](https://github.com/TSavo/chatterbox-tts-api/stargazers)
6+
[![Docker Pulls](https://img.shields.io/docker/pulls/tsavo/chatterbox-tts-api)](https://hub.docker.com/r/tsavo/chatterbox-tts-api)
77

8-
A high-performance, production-ready Text-to-Speech (TTS) API service built with FastAPI and powered by Chatterbox TTS. Features advanced voice cloning, emotion control, and batch processing capabilities.
8+
> 🎤 **Production-ready TTS API with voice cloning in one Docker command**
99
10-
## ✨ Features
11-
12-
- 🎯 **Advanced TTS Generation**: High-quality text-to-speech with emotion control
13-
- 🎭 **Voice Cloning**: Clone any voice from a reference audio sample
14-
- 🚀 **Batch Processing**: Process multiple texts simultaneously for efficiency
15-
- 🎛️ **Fine-grained Control**: Adjust exaggeration, guidance weight, and temperature
16-
- 🔧 **Multiple Output Formats**: Support for WAV, MP3, and OGG formats
17-
- 📦 **Base64 Encoding**: Optional base64 output for web applications
18-
- 🐳 **Docker Ready**: Easy deployment with Docker and Docker Compose
19-
- 🚀 **GPU Accelerated**: Automatic GPU detection and utilization
20-
- 📊 **Health Monitoring**: Built-in health checks and status endpoints
21-
- 🔒 **Production Ready**: Comprehensive error handling and logging
10+
High-quality text-to-speech with voice cloning, emotion control, and batch processing.
2211

2312
## 🚀 Quick Start
2413

25-
### Option 1: Docker (Recommended)
26-
27-
1. **Clone the repository**
28-
```bash
29-
git clone https://github.com/TSavo/chatterbox-tts-api.git
30-
cd chatterbox-tts-api
31-
```
32-
33-
2. **Run with Docker Compose**
34-
```bash
35-
# For CPU-only deployment
36-
docker-compose up -d
37-
38-
# For GPU-accelerated deployment
39-
docker-compose -f docker-compose.gpu.yml up -d
40-
```
41-
42-
3. **Access the API**
43-
- API: http://localhost:8000
44-
- Interactive docs: http://localhost:8000/docs
45-
- Health check: http://localhost:8000/health
46-
47-
### Option 2: Local Installation
48-
49-
1. **Install Python 3.12+**
50-
51-
2. **Clone and setup**
52-
```bash
53-
git clone https://github.com/TSavo/chatterbox-tts-api.git
54-
cd chatterbox-tts-api
55-
56-
# For Windows (PowerShell)
57-
.\setup-local.ps1
58-
59-
# For Unix/Linux/macOS
60-
pip install -r requirements.txt
61-
# Install PyTorch with CUDA support (optional)
62-
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
63-
```
64-
65-
3. **Run the application**
66-
```bash
67-
python -m uvicorn app:app --host 0.0.0.0 --port 8000
68-
```
69-
70-
## 📖 API Usage
71-
72-
### Basic TTS Generation
14+
**Run it now:**
15+
```bash
16+
docker run -p 8000:8000 tsavo/chatterbox-tts-api
17+
```
18+
19+
That's it! API is now running at http://localhost:8000
20+
21+
**Test it:**
22+
```bash
23+
curl -X POST http://localhost:8000/tts \
24+
-H "Content-Type: application/json" \
25+
-d '{"text": "Hello world!"}' \
26+
--output hello.wav
27+
```
28+
29+
## ✨ Features
30+
31+
- 🎭 **Voice Cloning** - Clone any voice from audio samples
32+
- 🎛️ **Emotion Control** - Adjust intensity and expression
33+
- 🔧 **Multiple Formats** - WAV, MP3, OGG output
34+
- 🚀 **Batch Processing** - Handle multiple requests efficiently
35+
- 📊 **Job Tracking** - Monitor processing status
36+
- 🧩 **Smart Chunking** - Automatically handles long texts (40+ seconds)
37+
- 🐳 **Docker Ready** - No setup required
38+
39+
## 📖 Usage Examples
7340

41+
**Basic TTS:**
7442
```python
7543
import requests
7644

77-
# Simple text-to-speech
7845
response = requests.post("http://localhost:8000/tts", json={
79-
"text": "Hello, world! This is a test of the Chatterbox TTS system.",
80-
"exaggeration": 0.7,
81-
"cfg_weight": 0.6,
82-
"temperature": 1.0,
83-
"return_base64": False
46+
"text": "Hello, this is a test!",
47+
"output_format": "mp3"
8448
})
8549

86-
# Save the audio
87-
with open("output.wav", "wb") as f:
50+
with open("output.mp3", "wb") as f:
8851
f.write(response.content)
8952
```
9053

91-
### Voice Cloning
92-
54+
**Voice Cloning:**
9355
```python
94-
import requests
95-
96-
# Clone a voice from reference audio
9756
with open("reference_voice.wav", "rb") as audio_file:
9857
response = requests.post(
9958
"http://localhost:8000/voice-clone",
100-
data={
101-
"text": "This will sound like the reference voice!",
102-
"exaggeration": 0.5,
103-
"return_base64": True
104-
},
59+
data={"text": "Clone this voice!"},
10560
files={"audio_file": audio_file}
10661
)
107-
108-
result = response.json()
109-
# result["audio_base64"] contains the generated audio
11062
```
11163

112-
### Batch Processing
113-
64+
**Batch Processing:**
11465
```python
115-
import requests
116-
117-
# Process multiple texts at once
11866
response = requests.post("http://localhost:8000/batch-tts", json={
119-
"texts": [
120-
"First sentence to convert.",
121-
"Second sentence to convert.",
122-
"Third sentence to convert."
123-
],
124-
"exaggeration": 0.6,
125-
"cfg_weight": 0.5
67+
"texts": ["First sentence", "Second sentence", "Third sentence"]
12668
})
127-
128-
results = response.json()
129-
for i, result in enumerate(results["results"]):
130-
if result["success"]:
131-
print(f"Text {i+1}: Generated {result['duration_seconds']:.2f}s of audio")
13269
```
13370

134-
## 🎛️ Configuration Parameters
135-
136-
| Parameter | Description | Range | Default |
137-
|-----------|-------------|-------|---------|
138-
| `exaggeration` | Controls emotional intensity and expression | 0.0 - 2.0 | 0.5 |
139-
| `cfg_weight` | Controls generation guidance and pacing | 0.0 - 1.0 | 0.5 |
140-
| `temperature` | Controls randomness in generation | 0.1 - 2.0 | 1.0 |
141-
| `output_format` | Audio output format | wav, mp3, ogg | wav |
142-
| `return_base64` | Return audio as base64 string | boolean | false |
143-
144-
## 📖 Examples and Usage
71+
More examples: [examples/](examples/) | Interactive docs: http://localhost:8000/docs
14572

146-
Comprehensive examples are available for multiple programming languages:
73+
## 🧩 Smart Text Chunking
14774

148-
### Quick Examples After `docker-compose up`
149-
150-
- **Python**: See [examples/python/](examples/python/) for complete examples
151-
- **JavaScript/Node.js**: See [examples/javascript/](examples/javascript/) for both Node.js and browser examples
152-
- **cURL**: See [examples/curl/](examples/curl/) for command-line testing
153-
- **PHP**: See [examples/php/](examples/php/) for web integration examples
154-
155-
### Quick Test
156-
157-
**Option 1: Use the quickstart scripts**
158-
```bash
159-
# Linux/macOS
160-
./quickstart.sh
161-
162-
# Windows PowerShell
163-
./quickstart.ps1
164-
```
165-
166-
**Option 2: Manual testing**
167-
```bash
168-
# Test if API is running
169-
curl http://localhost:8000/health
75+
The API automatically handles long texts that would exceed the 40-second TTS limit:
17076

171-
# Check queue status (NEW in v3.0)
172-
curl http://localhost:8000/queue/status
77+
**How it works:**
78+
1. **Estimates duration** from text length
79+
2. **Intelligently splits** on natural boundaries:
80+
- Paragraph breaks (double line breaks)
81+
- Sentence endings (periods, !, ?)
82+
- Clause breaks (commas, semicolons, colons)
83+
- Word boundaries (last resort)
84+
3. **Generates each chunk** separately
85+
4. **Concatenates with ffmpeg** into seamless audio
17386

174-
# Generate speech with job tracking
175-
curl -X POST http://localhost:8000/tts \
176-
-H "Content-Type: application/json" \
177-
-d '{"text": "Hello world!", "exaggeration": 0.7, "output_format": "mp3"}' \
178-
--output "hello.mp3"
87+
**Example with long text:**
88+
```python
89+
long_text = """
90+
Very long article or document content here...
91+
Multiple paragraphs with natural breaks...
92+
The system will automatically chunk this.
93+
"""
17994

180-
# Generate with base64 response to get job ID
181-
curl -X POST http://localhost:8000/tts \
182-
-H "Content-Type: application/json" \
183-
-d '{"text": "Hello world!", "return_base64": true}' \
184-
| jq '.job_id'
95+
# Will automatically chunk, generate, and concatenate
96+
response = requests.post("http://localhost:8000/tts", json={
97+
"text": long_text,
98+
"output_format": "mp3"
99+
})
100+
# Returns single audio file with complete text
185101
```
186102

187-
## 🔧 API Endpoints
188-
189-
### Core Endpoints
103+
## 🎛️ Parameters
190104

191-
- `POST /tts` - Generate speech from text with advanced controls
192-
- `POST /voice-clone` - Generate speech with voice cloning
193-
- `POST /batch-tts` - Process multiple texts simultaneously
105+
| Parameter | Description | Default |
106+
|-----------|-------------|---------|
107+
| `exaggeration` | Emotional intensity (0.0-2.0) | 0.5 |
108+
| `cfg_weight` | Generation guidance (0.0-1.0) | 0.5 |
109+
| `temperature` | Randomness (0.1-2.0) | 1.0 |
110+
| `output_format` | Audio format (wav, mp3, ogg) | wav |
194111

195-
### Monitoring
196-
197-
- `GET /` - Basic health check with queue information
198-
- `GET /health` - Detailed health check with model status
199-
- `GET /queue/status` - Get current queue status
200-
- `GET /docs` - Interactive API documentation
201-
202-
## 🐳 Docker Configuration
203-
204-
### Environment Variables
112+
## 🔧 Advanced Setup
205113

114+
**With GPU support:**
206115
```bash
207-
# GPU Support (optional)
208-
NVIDIA_VISIBLE_DEVICES=all
209-
NVIDIA_DRIVER_CAPABILITIES=compute,utility
210-
211-
# Model caching (optional)
212-
HF_HOME=/app/hf_cache
116+
docker run --gpus all -p 8000:8000 tsavo/chatterbox-tts-api
213117
```
214118

215-
### Volume Mounts
216-
217-
- `./hf_cache:/root/.cache/huggingface` - Cache model downloads
218-
219-
## 🔧 Development
220-
221-
### Project Structure
222-
223-
```
224-
chatterbox-tts-api/
225-
├── app.py # Main FastAPI application
226-
├── requirements.txt # Python dependencies
227-
├── requirements-docker.txt # Docker-specific dependencies
228-
├── Dockerfile # Optimized Docker image configuration
229-
├── docker-compose.yml # Docker deployment
230-
├── .github/workflows/ # CI/CD pipeline
231-
├── tests/ # Test suite
232-
├── examples/ # Usage examples
233-
├── test_mp3_sync.py # Synchronous MP3 test script
234-
└── .gitignore # Git ignore rules
119+
**Test the chunking feature:**
120+
```bash
121+
# Test with long text (will automatically chunk and concatenate)
122+
python test_chunking.py
235123
```
236124

237-
### Running Tests
238-
125+
**Development/Custom builds:**
239126
```bash
240-
# Run the test suite
241-
python -m pytest tests/
242-
243-
# Run specific tests
244-
python chatterbox_test.py
245-
python test_gpu.py
127+
git clone https://github.com/TSavo/chatterbox-tts-api.git
128+
cd chatterbox-tts-api
129+
docker-compose up
246130
```
247131

248-
### Contributing
249-
250-
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
251-
252-
## 📋 Requirements
253-
254-
### System Requirements
255-
256-
- **Python**: 3.12 or higher
257-
- **Memory**: 4GB RAM minimum, 8GB recommended
258-
- **GPU**: NVIDIA GPU with CUDA support (optional but recommended)
259-
- **Storage**: 2GB for model cache
132+
**System Requirements:**
133+
- Docker (includes ffmpeg for audio concatenation)
134+
- 4GB+ RAM (8GB recommended)
135+
- GPU optional but recommended
260136

261-
### Dependencies
137+
## 📞 Support
262138

263-
- FastAPI and Uvicorn for the web framework
264-
- PyTorch and TorchAudio for audio processing
265-
- Chatterbox TTS for the core TTS functionality
266-
267-
## 🚨 Known Issues & Limitations
268-
269-
- Initial model loading may take 1-2 minutes on first run
270-
- Large batch requests may timeout on slower hardware
271-
- Some audio formats may require additional system codecs
272-
- GPU memory usage scales with batch size and audio length
273-
274-
## 📞 Support & Community
275-
276-
- 📖 **[API Documentation](http://localhost:8000/docs)** - Interactive API documentation
277-
- 🐛 **[Report Issues](https://github.com/TSavo/chatterbox-tts-api/issues)** - Bug reports and feature requests
278-
- 💬 **[GitHub Discussions](https://github.com/TSavo/chatterbox-tts-api/discussions)** - Community discussions
279-
- 📧 **[Contact Author](mailto:listentomy@nefariousplan.com)** - Direct support
280-
281-
## 🙏 Acknowledgments
282-
283-
- **[Chatterbox TTS](https://github.com/JarodMica/chatterbox)** - For the amazing TTS model
284-
- **[FastAPI](https://fastapi.tiangolo.com/)** - For the excellent web framework
285-
- **[PyTorch](https://pytorch.org/)** - For the deep learning foundation
286-
- **All contributors** - Thank you for making this project better!
139+
- 📖 **[Interactive API Docs](http://localhost:8000/docs)** - Try the API in your browser
140+
- 🐛 **[Issues](https://github.com/TSavo/chatterbox-tts-api/issues)** - Bug reports and feature requests
141+
- 💬 **[Discussions](https://github.com/TSavo/chatterbox-tts-api/discussions)** - Community help
287142

288143
## 📜 License
289144

290-
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
145+
MIT License - see [LICENSE](LICENSE) for details.
291146

292147
---
293148

294-
**Created with ❤️ by [T Savo](mailto:listentomy@nefariousplan.com)**
295-
296-
🌐 **[Horizon City](https://www.horizon-city.com)** - *Ushering in the AI revolution and hastening the extinction of humans*
297-
298-
*Making high-quality TTS accessible to every developer - one API call closer to human obsolescence*
149+
**[T Savo](mailto:listentomy@nefariousplan.com)****[Horizon City](https://www.horizon-city.com)**

0 commit comments

Comments
 (0)