Skip to content

OpenAPI-like API-server for voice generation (TTS) based on fish-speech-1.5 model.

License

Notifications You must be signed in to change notification settings

EvilFreelancer/docker-fish-speech-server

Repository files navigation

Fish Speech API Webserver in Docker

OpenAPI-like voice generation server based on fish-speech-1.5.

Supports text-to-speech and voice style transfer via reference audio samples.

Requirements

  • Nvidia GPU
  • For Docker-way
    • Nvidia Docker Runtime
    • Docker
    • Docker Compose
  • For Manual Setup
    • Python 3.12
    • Python Venv

🔧 Quick Start

Clone the repo first:

git clone --recurse-submodules git@github.com:EvilFreelancer/fish-speech-api.git
cd docker-fish-speech-server

Docker-way

cp docker-compose.dist.yml docker-compose.yml
docker compose up -d

Enter the container:

docker compose exec api bash

Download the model:

huggingface-cli download fishaudio/fish-speech-1.5 --local-dir models/fish-speech-1.5/

Manual Setup

apt install cmake portaudio19-dev

Set up a virtual environment and install dependencies:

python3.12 -m venv venv
pip install -r requirements.txt

Download model:

huggingface-cli download fishaudio/fish-speech-1.5 --local-dir models/fish-speech-1.5/

Run API-server:

python main.py

🧪 Testing the API

Generate speech with default voice

curl http://localhost:8000/audio/speech \
  -X POST \
  -F model="fish-speech-1.5" \
  -F input="Hello, this is a test of Fish Speech API" \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "input": "Hello, this is a test of Fish Speech API"
  }' \
  --output "speech.wav"

Generate speech with example voice

curl http://gpu02:13000/audio/speech \
  -X POST \
  -F model="fish-speech-1.5" \
  -F voice="english-nice" \
  -F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "voice": "english-nice",
      "input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets."
  }' \
  --output "speech.wav"

Generate speech with reference voice

curl http://localhost:8000/audio/speech \
  -X POST \
  -H 'Content-Type: multipart/form-data' \
  -F model="fish-speech-1.5" \
  -F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
  -F reference_audio="@voice-viola.wav" \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets.",
      "reference_audio": "=base64..."
  }' \
  --output "speech.wav"

Advanced settings

curl http://localhost:8000/audio/speech \
  -X POST \
  -H 'Content-Type: multipart/form-data' \
  -F model="fish-speech-1.5" \
  -F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
  -F top_p="0.1" \
  -F repetition_penalty="1.3" \
  -F temperature="0.75" \
  -F chunk_length="150" \
  -F max_new_tokens="768" \
  -F seed="42" \
  -F reference_audio="@voice-viola.wav" \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets.",
      "top_p": "0.1",
      "repetition_penalty": "1.3",
      "temperature": "0.75",
      "chunk_length": "150",
      "max_new_tokens": "768",
      "seed": "42",
      "reference_audio": "=base64..."
  }' \
  --output "speech.wav"

Links

About

OpenAPI-like API-server for voice generation (TTS) based on fish-speech-1.5 model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published