A cross-platform CLI tool for transcribing audio from MP4 files locally using Whisper AI. Built with Rust for performance and reliability.
Current Phase: Planning & Design
Target: Alpha Release in ~2 weeks
Priority: #1 Active Project
This repository currently contains planning documents and technical specifications. Implementation begins with Phase 1 of our roadmap.
- Local Processing: Completely offline transcription using Whisper AI
- Long File Support: Handle 45+ minute MP4 files efficiently through chunking
- Cross-Platform: Native binaries for macOS and Windows
- CLI-First Design: Simple command-line interface for rapid development
- Memory Efficient: Smart chunking strategy for large files
- Progress Tracking: Real-time progress indication during transcription
# Basic transcription
./hearth input.mp4
# With custom output
./hearth input.mp4 --output transcript.txt
# With options
./hearth input.mp4 --model small --chunk-size 300 --verbose
- Rust 1.70+ (for development)
- FFmpeg (for audio extraction)
- ~2GB disk space (for Whisper models)
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Install FFmpeg
brew install ffmpeg
# Install Rust from https://rustup.rs/
# Install FFmpeg from https://ffmpeg.org/download.html
├── ROADMAP_TO_ALPHA.md # Detailed development plan
├── CLAUDE.md # AI assistant guidance
├── tauri_transcription_memo.md # CLI-first technical spec
├── tauri_transcription_memo-v-1.md # Original GUI approach
└── claude-web-conversation-summary.md # Design decisions
Phase | Duration | Goal |
---|---|---|
Phase 1 | 2 days | Foundation (Rust project + audio extraction) |
Phase 2 | 2 days | Whisper integration (short file transcription) |
Phase 3 | 2 days | Long file handling (45-minute support) |
Phase 4 | 1 day | macOS polish and testing |
Phase 5 | 2 days | Windows cross-compilation |
Total Timeline: 9 development days (~1.5-2 weeks)
See ROADMAP_TO_ALPHA.md for detailed daily tasks and milestones.
We're building a command-line tool first for rapid prototyping, then extracting the core logic for a future GUI wrapper.
- Language: Rust
- CLI Framework:
clap
- Speech Recognition: Local Whisper (
candle-whisper
orwhisper-rs
) - Audio Processing:
ffmpeg-rs
or system FFmpeg - Async Runtime:
tokio
hearth/
├── src/
│ ├── main.rs # CLI interface
│ ├── audio.rs # MP4 → audio extraction
│ ├── transcription.rs # Whisper integration
│ ├── chunking.rs # Audio chunking logic
│ └── progress.rs # Progress display
├── models/ # Whisper models
└── tests/ # Integration tests
- ✅ Transcribe MP4 files up to 45 minutes
- ✅ Generate accurate TXT output
- ✅ Work completely offline
- ✅ Native macOS and Windows binaries
- ✅ Reasonable performance (< 2x real-time)
- ✅ Handle common MP4 formats
- ✅ Memory-efficient processing
- Short files (2-3 minutes) for rapid development
- Medium files (15 minutes) for chunking validation
- Long files (45 minutes) for stress testing
- Cross-platform testing on macOS and Windows
After Alpha release:
- Beta: Add SRT/VTT export, batch processing
- GUI Phase: Extract core to library, build Tauri frontend
- Advanced Features: Multiple languages, speaker detection
- CLAUDE.md - AI assistant guidance for development
- ROADMAP_TO_ALPHA.md - Detailed development roadmap
- Technical Specs - See memo files for architectural decisions
This is currently a solo project in active development. The codebase will be ready for contributions after Alpha release.
For now, you can:
- Review the technical specifications
- Suggest improvements to the roadmap
- Test Alpha builds when available
[License TBD - will be added before first release]
This is an indie development project. Issues and discussions welcome once implementation begins.
Note: This project is in active development. Star and watch for updates as we progress through the roadmap! 🚀