Skip to content

200ok-ch/dictate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dictate 🔴

A background speech-to-text dictation tool written in Clojure using Babashka. This tool provides seamless speech-to-text functionality by recording audio in the background and transcribing it using OpenAI’s Whisper API.

Dictate now supports continuous recording with automatic silence detection, providing a seamless hands-free dictation experience.

Features

  • Background Service: Runs as a background process for continuous dictation
  • Toggle Control: Easy on/off switching for recording mode
  • OpenAI Whisper Integration: High-quality speech-to-text transcription
  • Configurable Audio Input: Support for different audio devices
  • System Integration: Works with xbindkeys and i3status
  • Visual Feedback: State indicator for active/inactive modes

Requirements

  • Babashka - Clojure interpreter for scripting
  • sox - Swiss Army Knife of sound processing utilities
  • xdotool - X11 automation tool for typing text
  • curl - HTTP client for API requests
  • OpenAI API key for Whisper transcription

Installation

Easily install with bbin

bbin install io.github.200ok-ch/dictate

Usage

Basic Commands

# Start the background service
dictate --service

# Toggle recording on/off
dictate --toggle

# Show help
dictate --help

Configuration Options

  • -a, --device=DEVICE - Audio input device [default: default]
  • -d, --delay=MS - Typing delay in milliseconds [default: 25]
  • -m, --model=MODEL - Whisper model [default: whisper-1]
  • -p, --api-path=PATH - API endpoint path [default: /v1/audio/transcriptions]
  • -r, --api-root=URL - API root URL [default: https://api.openai.com]

Examples

# Start service with default settings
dictate --service

# Start service with specific audio device
dictate --service --device=hw:1,0

# Toggle recording mode
dictate --toggle

System Integration

xbindkeys Configuration

Add this to your ~/.xbindkeysrc for keyboard shortcuts:

"dictate --toggle"
  Pause

i3status Configuration

Add this to your i3status config for status bar integration:

order += "read_file dictate"

read_file dictate {
    path = "~/.dictate.state"
    format = "%content"
}

State Management

The tool uses a simple state file (~/.dictate.state) to track whether recording is active or inactive:

  • Active state: Contains the 🔴 indicator
  • Inactive state: Empty file

Similar Projects & Acknowledgements

License

This project is maintained by 200ok GmbH.

Configuration

You can customize Dictate’s behavior by creating a dictate.yml configuration file in the SAME directory where you call dictate.

The values in this example are also the defaults.

Example:

# audio
device: "default"                    # Audio input device (e.g., "default", "hw:1,0")
# silence
volume: 2                            # Maximum volume of silence in percentage
duration: 1.5                        # Minimum duration of silence in secs
# transcription
api-root: "https://api.openai.com"   # API root URL
api-path: "/v1/audio/transcriptions" # API endpoint path
api-key: "sk-..."                    # Your OpenAI API key
model: "gpt-4o-transcribe"           # Whisper model to use
# typing
delay: 25                            # Typing delay in milliseconds
# misc
i3status: false                      # Whether to reload i3status on toggle
emojis: false                        # Dis-/enable emoji feature

About

Speech-to-text anywhere at the tip of a button

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •