Skip to content

Racks-Labs/whisper-asr-webservice

 
 

Repository files navigation

Release Docker Pulls Build Licence

🎉 Join our Discord Community! Connect with other users, get help, and stay updated on the latest features: https://discord.gg/4Q5YVrePzZ

Whisper ASR Box

Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.

Features

Current release (v1.9.1) supports following whisper models:

Quick Usage

CPU

docker run -d -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest

GPU

docker run -d --gpus all -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest-gpu

Cache

To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:

docker run -d -p 9000:9000 \
  -v $PWD/cache:/root/.cache/ \
  onerahmet/openai-whisper-asr-webservice:latest

Key Features

  • Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
  • Multiple output formats (text, JSON, VTT, SRT, TSV)
  • Word-level timestamps support
  • Voice activity detection (VAD) filtering
  • Speaker diarization (with WhisperX)
  • FFmpeg integration for broad audio/video format support
  • GPU acceleration support
  • Configurable model loading/unloading
  • REST API with Swagger documentation

Environment Variables

Key configuration options:

  • ASR_ENGINE: Engine selection (openai_whisper, faster_whisper, whisperx)
  • ASR_MODEL: Model selection (tiny, base, small, medium, large-v3, etc.)
  • ASR_MODEL_PATH: Custom path to store/load models
  • ASR_DEVICE: Device selection (cuda, cpu)
  • MODEL_IDLE_TIMEOUT: Timeout for model unloading

Documentation

For complete documentation, visit: https://ahmetoner.github.io/whisper-asr-webservice

Development

# Install poetry v2.X
pip3 install poetry

# Install dependencies for cpu
poetry install --extras cpu

# Install dependencies for cuda
poetry install --extras cuda

# Run service
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000

After starting the service, visit http://localhost:9000 or http://0.0.0.0:9000 in your browser to access the Swagger UI documentation and try out the API endpoints.

Credits

  • This software uses libraries from the FFmpeg project under the LGPLv2.1

About

OpenAI Whisper ASR Webservice API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.8%
  • Dockerfile 3.2%