Skip to content

patrickmcrawley/quickwhisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QuickWhisper

A global CLI tool to transcribe and format audio from YouTube videos, podcasts, and other media using yt-dlp, Groq's Whisper API, and OpenRouter LLMs for intelligent formatting.

Installation

Install QuickWhisper globally using uv:

git clone <repo-url>
cd quickwhisper
uv tool install .

Now quickwhisper is available globally from any directory!

Configuration

Create a .env file with your API keys in one of these locations:

  • Your home directory: ~/.env (recommended for global use)
  • Current working directory: ./.env (project-specific, overrides global)
# Add to ~/.env or ./.env
GROQ_API_KEY=your_groq_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here

Get your API keys:

Usage

quickwhisper <input> [--output OUTPUT_DIR]

Where <input> can be:

  • A URL (YouTube, podcast, etc.)
  • A local audio/video file path

Examples

# Basic usage - saves to current directory
quickwhisper "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Local file
quickwhisper "/path/to/audio.wav"

# Specify output directory
quickwhisper "https://podcasts.apple.com/podcast/id123456789" --output ~/Documents/Transcripts

# Short flag
quickwhisper "https://example.com/audio.mp3" -o ./transcripts

# Works from any directory
cd ~/Documents/MyProject
quickwhisper "youtube.com/videoid"  # outputs here

How It Works - Data Flow

flowchart TD
    Start([User Input:<br/>URL or Local File]) --> Check{Is it a URL?}
    
    Check -->|Yes| Download[Download Audio<br/>using yt-dlp]
    Check -->|No| LocalFile[Load Local File]
    
    Download --> AudioFile[Audio File<br/>flac/mp3/mp4/wav/etc.]
    LocalFile --> CheckFormat{Supported<br/>Format?}
    
    CheckFormat -->|Yes| AudioFile
    CheckFormat -->|No| Convert[Convert to MP3<br/>using pydub]
    Convert --> AudioFile
    
    AudioFile --> CheckSize{File Size<br/>> 20MB?}
    
    CheckSize -->|No| Transcribe[Transcribe with<br/>Groq Whisper API]
    CheckSize -->|Yes| Chunk[Split into Chunks<br/>with 10s overlap]
    
    Chunk --> TranscribeChunks[Transcribe Each<br/>Chunk Separately]
    TranscribeChunks --> Merge[Intelligently Merge<br/>Transcriptions]
    
    Transcribe --> RawText[Raw Transcription<br/>Text]
    Merge --> RawText
    
    RawText --> SaveRaw[Save Raw<br/>Transcription]
    SaveRaw --> RawOutput([Output:<br/>title_raw.md])
    
    RawText --> Format[Format with LLM<br/>via OpenRouter:<br/>• Add paragraphs<br/>• Fix errors<br/>• Add markdown]
    
    Format --> SaveFormatted[Save Formatted<br/>Transcription]
    
    SaveFormatted --> FormattedOutput([Output:<br/>title.md])
    
    style Start fill:#e1f5e1
    style RawOutput fill:#e1f5e1
    style FormattedOutput fill:#e1f5e1
    style Download fill:#e3f2fd
    style LocalFile fill:#e3f2fd
    style Transcribe fill:#fff3e0
    style TranscribeChunks fill:#fff3e0
    style Format fill:#f3e5f5
Loading

Data Flow Summary

Input:

  • URL (YouTube, podcast, etc.) OR
  • Local audio/video file (wav, mp3, mp4, etc.)

Processing Steps:

  1. Source Detection: Determines if input is URL or local file
  2. Audio Acquisition: Downloads from URL or loads local file
  3. Format Check: Ensures audio is in Groq-supported format (converts if needed)
  4. Size Check: Files >20MB are chunked with overlap for better handling
  5. Transcription: Uses Groq's Whisper Large v3 model
  6. Smart Merging: For chunked files, intelligently merges overlapping transcriptions
  7. Formatting: Uses AI to add structure, fix errors, and apply markdown
  8. Output: Saves formatted transcription as a markdown file

Output:

  • Raw transcription: title_raw.md - Direct transcription from Groq Whisper
  • Formatted transcription: title.md - AI-enhanced with structure and markdown

Requirements

  • Python 3.12+
  • uv (Python package manager)
  • ffmpeg (for audio conversion when needed)
  • Groq API key (for Whisper transcription)
  • OpenRouter API key (for text formatting)

Features

  • Global CLI tool - Available from any directory without virtual environment activation
  • Smart audio handling - Downloads optimal format, only converts when necessary
  • Large file support - Automatic chunking for files >20MB with intelligent merging
  • Transcribes using Groq's Whisper Large v3 model
  • Formats transcription using Claude 3.5 Sonnet via OpenRouter
  • Intelligent formatting - Adds logical paragraph structure and markdown formatting
  • Error correction - Fixes obvious transcription errors
  • Dual output - Saves both raw and formatted transcriptions as markdown files
  • Flexible configuration - Global or project-specific API key configuration
  • Automatic cleanup - Removes temporary files after processing

Supported Sources

Any source supported by yt-dlp including:

  • YouTube videos
  • Podcasts (Apple Podcasts, Spotify, etc.)
  • Audio/video files from hundreds of sites
  • Direct media URLs

Supported Audio Formats

Groq accepts: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm

QuickWhisper intelligently:

  • Uses original format if supported (faster)
  • Only converts when format is unsupported
  • Optimizes file size for API limits

About

Quick CLI transcription tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages