A FastAPI service for text-to-speech synthesis using the F5-TTS model.
- Text-to-speech synthesis with voice profile support
- JWT authentication
- Docker containerization
- GPU support
- Voice profile management
- Health monitoring
- Python 3.11 or higher
- Docker with NVIDIA Container Toolkit
- NVIDIA GPU with CUDA support
- F5-TTS model weights
├── app/
│ ├── api/
│ │ ├── models/
│ │ └── routes/
│ ├── core/
│ ├── services/
│ └── voice_profiles/
├── scripts/
├── tests/
├── voice_profiles/
├── weights/
└── docker-compose.yml- Install dependencies:
pip install -r requirements.txt-
Download https://huggingface.co/SWivid/F5-TTS & Place model files:
- Put
model_1200000.ptinweights/ - Put
F5TTS_Base_vocab.txtinweights/
- Put
-
Set up voice profiles:
python scripts/setup_test_voice.py- Verify setup:
python scripts/verify_setup.py- Start the service:
docker-compose up --build- Generate an authentication token:
bash
python scripts/generate_token.py
- Test the API:
python scripts/test_api.pyGET /health- Health checkGET /api/v1/voices/list- List available voice profilesPOST /api/v1/tts/synthesize- Generate speech from text
{
"text": "Text to convert to speech",
"voice_profile": "bane"
}PORT- Server port (default: 8081)MODEL_DIR- Directory containing model filesVOICE_PROFILES_DIR- Directory containing voice profilesSECRET_KEY- JWT secret key
- Create a Python virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows- Install development dependencies:
pip install -r requirements.txt- Run tests:
python -m pytest tests/This project uses the F5-TTS model. Please ensure compliance with the model's license terms.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request