A background speech-to-text dictation tool written in Clojure using Babashka. This tool provides seamless speech-to-text functionality by recording audio in the background and transcribing it using OpenAI’s Whisper API.
Dictate now supports continuous recording with automatic silence detection, providing a seamless hands-free dictation experience.
- Background Service: Runs as a background process for continuous dictation
- Toggle Control: Easy on/off switching for recording mode
- OpenAI Whisper Integration: High-quality speech-to-text transcription
- Configurable Audio Input: Support for different audio devices
- System Integration: Works with xbindkeys and i3status
- Visual Feedback: State indicator for active/inactive modes
- Babashka - Clojure interpreter for scripting
sox
- Swiss Army Knife of sound processing utilitiesxdotool
- X11 automation tool for typing textcurl
- HTTP client for API requests- OpenAI API key for Whisper transcription
Easily install with bbin
bbin install io.github.200ok-ch/dictate
# Start the background service
dictate --service
# Toggle recording on/off
dictate --toggle
# Show help
dictate --help
-a, --device=DEVICE
- Audio input device [default: default]-d, --delay=MS
- Typing delay in milliseconds [default: 25]-m, --model=MODEL
- Whisper model [default: whisper-1]-p, --api-path=PATH
- API endpoint path [default: /v1/audio/transcriptions]-r, --api-root=URL
- API root URL [default: https://api.openai.com]
# Start service with default settings
dictate --service
# Start service with specific audio device
dictate --service --device=hw:1,0
# Toggle recording mode
dictate --toggle
Add this to your ~/.xbindkeysrc
for keyboard shortcuts:
"dictate --toggle" Pause
Add this to your i3status config for status bar integration:
order += "read_file dictate" read_file dictate { path = "~/.dictate.state" format = "%content" }
The tool uses a simple state file (~/.dictate.state
) to track
whether recording is active or inactive:
- Active state: Contains the 🔴 indicator
- Inactive state: Empty file
This project is maintained by 200ok GmbH.
You can customize Dictate’s behavior by creating a dictate.yml
configuration file in the SAME directory where you call dictate
.
The values in this example are also the defaults.
Example:
# audio
device: "default" # Audio input device (e.g., "default", "hw:1,0")
# silence
volume: 2 # Maximum volume of silence in percentage
duration: 1.5 # Minimum duration of silence in secs
# transcription
api-root: "https://api.openai.com" # API root URL
api-path: "/v1/audio/transcriptions" # API endpoint path
api-key: "sk-..." # Your OpenAI API key
model: "gpt-4o-transcribe" # Whisper model to use
# typing
delay: 25 # Typing delay in milliseconds
# misc
i3status: false # Whether to reload i3status on toggle
emojis: false # Dis-/enable emoji feature