Fish Speech API Webserver in Docker

OpenAPI-like voice generation server based on fish-speech-1.5.

Supports text-to-speech and voice style transfer via reference audio samples.

Requirements

Nvidia GPU
For Docker-way
- Nvidia Docker Runtime
- Docker
- Docker Compose
For Manual Setup
- Python 3.12
- Python Venv

🔧 Quick Start

Clone the repo first:

git clone --recurse-submodules git@github.com:EvilFreelancer/fish-speech-api.git
cd docker-fish-speech-server

Docker-way

cp docker-compose.dist.yml docker-compose.yml
docker compose up -d

Enter the container:

docker compose exec api bash

Download the model:

huggingface-cli download fishaudio/fish-speech-1.5 --local-dir models/fish-speech-1.5/

Manual Setup

apt install cmake portaudio19-dev

Set up a virtual environment and install dependencies:

python3.12 -m venv venv
pip install -r requirements.txt

Download model:

huggingface-cli download fishaudio/fish-speech-1.5 --local-dir models/fish-speech-1.5/

Run API-server:

python main.py

🧪 Testing the API

Generate speech with default voice

curl http://localhost:8000/audio/speech \
  -X POST \
  -F model="fish-speech-1.5" \
  -F input="Hello, this is a test of Fish Speech API" \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "input": "Hello, this is a test of Fish Speech API"
  }' \
  --output "speech.wav"

Generate speech with example voice

curl http://gpu02:13000/audio/speech \
  -X POST \
  -F model="fish-speech-1.5" \
  -F voice="english-nice" \
  -F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "voice": "english-nice",
      "input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets."
  }' \
  --output "speech.wav"

Generate speech with reference voice

curl http://localhost:8000/audio/speech \
  -X POST \
  -H 'Content-Type: multipart/form-data' \
  -F model="fish-speech-1.5" \
  -F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
  -F reference_audio="@voice-viola.wav" \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets.",
      "reference_audio": "=base64..."
  }' \
  --output "speech.wav"

Advanced settings

curl http://localhost:8000/audio/speech \
  -X POST \
  -H 'Content-Type: multipart/form-data' \
  -F model="fish-speech-1.5" \
  -F input="Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets." \
  -F top_p="0.1" \
  -F repetition_penalty="1.3" \
  -F temperature="0.75" \
  -F chunk_length="150" \
  -F max_new_tokens="768" \
  -F seed="42" \
  -F reference_audio="@voice-viola.wav" \
  --output "speech.wav"

In JSON format:

curl http://localhost:8000/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
      "model": "fish-speech-1.5",
      "input": "Dr. Eleanor Whitaker, a quantum physicist from Edinburgh, surreptitiously analyzed the enigmatic hieroglyphs while humming Für Elise —her quizzical expression mirrored the cryptic symbols perplexing arrangement, yet she remained determined to decipher their archaic secrets.",
      "top_p": "0.1",
      "repetition_penalty": "1.3",
      "temperature": "0.75",
      "chunk_length": "150",
      "max_new_tokens": "768",
      "seed": "42",
      "reference_audio": "=base64..."
  }' \
  --output "speech.wav"

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
examples		examples
fish_speech @ d9f9f14		fish_speech @ d9f9f14
fish_speech_api		fish_speech_api
models		models
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.dist.yml		docker-compose.dist.yml
entrypoint.sh		entrypoint.sh
fish_speech_infer.py		fish_speech_infer.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fish Speech API Webserver in Docker

Requirements

🔧 Quick Start

Docker-way

Manual Setup

🧪 Testing the API

Generate speech with default voice

Generate speech with example voice

Generate speech with reference voice

Advanced settings

Links

About

Releases

Packages

Languages

License

EvilFreelancer/docker-fish-speech-server

Folders and files

Latest commit

History

Repository files navigation

Fish Speech API Webserver in Docker

Requirements

🔧 Quick Start

Docker-way

Manual Setup

🧪 Testing the API

Generate speech with default voice

Generate speech with example voice

Generate speech with reference voice

Advanced settings

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages