A global CLI tool to transcribe and format audio from YouTube videos, podcasts, and other media using yt-dlp, Groq's Whisper API, and OpenRouter LLMs for intelligent formatting.
Install QuickWhisper globally using uv:
git clone <repo-url>
cd quickwhisper
uv tool install .Now quickwhisper is available globally from any directory!
Create a .env file with your API keys in one of these locations:
- Your home directory:
~/.env(recommended for global use) - Current working directory:
./.env(project-specific, overrides global)
# Add to ~/.env or ./.env
GROQ_API_KEY=your_groq_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_hereGet your API keys:
- Groq API Key (for Whisper transcription)
- OpenRouter API Key (for text formatting)
quickwhisper <input> [--output OUTPUT_DIR]Where <input> can be:
- A URL (YouTube, podcast, etc.)
- A local audio/video file path
# Basic usage - saves to current directory
quickwhisper "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# Local file
quickwhisper "/path/to/audio.wav"
# Specify output directory
quickwhisper "https://podcasts.apple.com/podcast/id123456789" --output ~/Documents/Transcripts
# Short flag
quickwhisper "https://example.com/audio.mp3" -o ./transcripts
# Works from any directory
cd ~/Documents/MyProject
quickwhisper "youtube.com/videoid" # outputs hereflowchart TD
Start([User Input:<br/>URL or Local File]) --> Check{Is it a URL?}
Check -->|Yes| Download[Download Audio<br/>using yt-dlp]
Check -->|No| LocalFile[Load Local File]
Download --> AudioFile[Audio File<br/>flac/mp3/mp4/wav/etc.]
LocalFile --> CheckFormat{Supported<br/>Format?}
CheckFormat -->|Yes| AudioFile
CheckFormat -->|No| Convert[Convert to MP3<br/>using pydub]
Convert --> AudioFile
AudioFile --> CheckSize{File Size<br/>> 20MB?}
CheckSize -->|No| Transcribe[Transcribe with<br/>Groq Whisper API]
CheckSize -->|Yes| Chunk[Split into Chunks<br/>with 10s overlap]
Chunk --> TranscribeChunks[Transcribe Each<br/>Chunk Separately]
TranscribeChunks --> Merge[Intelligently Merge<br/>Transcriptions]
Transcribe --> RawText[Raw Transcription<br/>Text]
Merge --> RawText
RawText --> SaveRaw[Save Raw<br/>Transcription]
SaveRaw --> RawOutput([Output:<br/>title_raw.md])
RawText --> Format[Format with LLM<br/>via OpenRouter:<br/>• Add paragraphs<br/>• Fix errors<br/>• Add markdown]
Format --> SaveFormatted[Save Formatted<br/>Transcription]
SaveFormatted --> FormattedOutput([Output:<br/>title.md])
style Start fill:#e1f5e1
style RawOutput fill:#e1f5e1
style FormattedOutput fill:#e1f5e1
style Download fill:#e3f2fd
style LocalFile fill:#e3f2fd
style Transcribe fill:#fff3e0
style TranscribeChunks fill:#fff3e0
style Format fill:#f3e5f5
Input:
- URL (YouTube, podcast, etc.) OR
- Local audio/video file (wav, mp3, mp4, etc.)
Processing Steps:
- Source Detection: Determines if input is URL or local file
- Audio Acquisition: Downloads from URL or loads local file
- Format Check: Ensures audio is in Groq-supported format (converts if needed)
- Size Check: Files >20MB are chunked with overlap for better handling
- Transcription: Uses Groq's Whisper Large v3 model
- Smart Merging: For chunked files, intelligently merges overlapping transcriptions
- Formatting: Uses AI to add structure, fix errors, and apply markdown
- Output: Saves formatted transcription as a markdown file
Output:
- Raw transcription:
title_raw.md- Direct transcription from Groq Whisper - Formatted transcription:
title.md- AI-enhanced with structure and markdown
- Python 3.12+
- uv (Python package manager)
- ffmpeg (for audio conversion when needed)
- Groq API key (for Whisper transcription)
- OpenRouter API key (for text formatting)
- Global CLI tool - Available from any directory without virtual environment activation
- Smart audio handling - Downloads optimal format, only converts when necessary
- Large file support - Automatic chunking for files >20MB with intelligent merging
- Transcribes using Groq's Whisper Large v3 model
- Formats transcription using Claude 3.5 Sonnet via OpenRouter
- Intelligent formatting - Adds logical paragraph structure and markdown formatting
- Error correction - Fixes obvious transcription errors
- Dual output - Saves both raw and formatted transcriptions as markdown files
- Flexible configuration - Global or project-specific API key configuration
- Automatic cleanup - Removes temporary files after processing
Any source supported by yt-dlp including:
- YouTube videos
- Podcasts (Apple Podcasts, Spotify, etc.)
- Audio/video files from hundreds of sites
- Direct media URLs
Groq accepts: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
QuickWhisper intelligently:
- Uses original format if supported (faster)
- Only converts when format is unsupported
- Optimizes file size for API limits