QuickWhisper

A global CLI tool to transcribe and format audio from YouTube videos, podcasts, and other media using yt-dlp, Groq's Whisper API, and OpenRouter LLMs for intelligent formatting.

Installation

Install QuickWhisper globally using uv:

git clone <repo-url>
cd quickwhisper
uv tool install .

Now quickwhisper is available globally from any directory!

Configuration

Create a .env file with your API keys in one of these locations:

Your home directory: ~/.env (recommended for global use)
Current working directory: ./.env (project-specific, overrides global)

# Add to ~/.env or ./.env
GROQ_API_KEY=your_groq_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here

Get your API keys:

Groq API Key (for Whisper transcription)
OpenRouter API Key (for text formatting)

Usage

quickwhisper <input> [--output OUTPUT_DIR]

Where <input> can be:

A URL (YouTube, podcast, etc.)
A local audio/video file path

Examples

# Basic usage - saves to current directory
quickwhisper "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Local file
quickwhisper "/path/to/audio.wav"

# Specify output directory
quickwhisper "https://podcasts.apple.com/podcast/id123456789" --output ~/Documents/Transcripts

# Short flag
quickwhisper "https://example.com/audio.mp3" -o ./transcripts

# Works from any directory
cd ~/Documents/MyProject
quickwhisper "youtube.com/videoid"  # outputs here

How It Works - Data Flow

flowchart TD
    Start([User Input:<br/>URL or Local File]) --> Check{Is it a URL?}
    
    Check -->|Yes| Download[Download Audio<br/>using yt-dlp]
    Check -->|No| LocalFile[Load Local File]
    
    Download --> AudioFile[Audio File<br/>flac/mp3/mp4/wav/etc.]
    LocalFile --> CheckFormat{Supported<br/>Format?}
    
    CheckFormat -->|Yes| AudioFile
    CheckFormat -->|No| Convert[Convert to MP3<br/>using pydub]
    Convert --> AudioFile
    
    AudioFile --> CheckSize{File Size<br/>> 20MB?}
    
    CheckSize -->|No| Transcribe[Transcribe with<br/>Groq Whisper API]
    CheckSize -->|Yes| Chunk[Split into Chunks<br/>with 10s overlap]
    
    Chunk --> TranscribeChunks[Transcribe Each<br/>Chunk Separately]
    TranscribeChunks --> Merge[Intelligently Merge<br/>Transcriptions]
    
    Transcribe --> RawText[Raw Transcription<br/>Text]
    Merge --> RawText
    
    RawText --> SaveRaw[Save Raw<br/>Transcription]
    SaveRaw --> RawOutput([Output:<br/>title_raw.md])
    
    RawText --> Format[Format with LLM<br/>via OpenRouter:<br/>• Add paragraphs<br/>• Fix errors<br/>• Add markdown]
    
    Format --> SaveFormatted[Save Formatted<br/>Transcription]
    
    SaveFormatted --> FormattedOutput([Output:<br/>title.md])
    
    style Start fill:#e1f5e1
    style RawOutput fill:#e1f5e1
    style FormattedOutput fill:#e1f5e1
    style Download fill:#e3f2fd
    style LocalFile fill:#e3f2fd
    style Transcribe fill:#fff3e0
    style TranscribeChunks fill:#fff3e0
    style Format fill:#f3e5f5

Data Flow Summary

Input:

URL (YouTube, podcast, etc.) OR
Local audio/video file (wav, mp3, mp4, etc.)

Processing Steps:

Source Detection: Determines if input is URL or local file
Audio Acquisition: Downloads from URL or loads local file
Format Check: Ensures audio is in Groq-supported format (converts if needed)
Size Check: Files >20MB are chunked with overlap for better handling
Transcription: Uses Groq's Whisper Large v3 model
Smart Merging: For chunked files, intelligently merges overlapping transcriptions
Formatting: Uses AI to add structure, fix errors, and apply markdown
Output: Saves formatted transcription as a markdown file

Output:

Raw transcription: title_raw.md - Direct transcription from Groq Whisper
Formatted transcription: title.md - AI-enhanced with structure and markdown

Requirements

Python 3.12+
uv (Python package manager)
ffmpeg (for audio conversion when needed)
Groq API key (for Whisper transcription)
OpenRouter API key (for text formatting)

Features

Global CLI tool - Available from any directory without virtual environment activation
Smart audio handling - Downloads optimal format, only converts when necessary
Large file support - Automatic chunking for files >20MB with intelligent merging
Transcribes using Groq's Whisper Large v3 model
Formats transcription using Claude 3.5 Sonnet via OpenRouter
Intelligent formatting - Adds logical paragraph structure and markdown formatting
Error correction - Fixes obvious transcription errors
Dual output - Saves both raw and formatted transcriptions as markdown files
Flexible configuration - Global or project-specific API key configuration
Automatic cleanup - Removes temporary files after processing

Supported Sources

Any source supported by yt-dlp including:

YouTube videos
Podcasts (Apple Podcasts, Spotify, etc.)
Audio/video files from hundreds of sites
Direct media URLs

Supported Audio Formats

Groq accepts: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm

QuickWhisper intelligently:

Uses original format if supported (faster)
Only converts when format is unsupported
Optimizes file size for API limits

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
quickwhisper		quickwhisper
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

QuickWhisper

Installation

Configuration

Usage

Examples

How It Works - Data Flow

Data Flow Summary

Requirements

Features

Supported Sources

Supported Audio Formats

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

patrickmcrawley/quickwhisper

Folders and files

Latest commit

History

Repository files navigation

QuickWhisper

Installation

Configuration

Usage

Examples

How It Works - Data Flow

Data Flow Summary

Requirements

Features

Supported Sources

Supported Audio Formats

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages