AI-Powered Screenshot, Audio Transcription, and Text Processing for macOS
Transform your workflow with intelligent audio transcription, screenshot analysis, and AI-powered text processing. Hush brings cutting-edge AI capabilities directly to your macOS desktop with lightning-fast performance and seamless integration.
- Lightning Fast: Real-time audio transcription with < 100ms latency
- AI-Powered: Google Gemini integration for intelligent content processing
- Precision Accurate: 95%+ transcription accuracy for clear audio
- Instant Processing: Screenshot analysis in under 2 seconds
- Real-Time: Live transcript viewer with automatic updates
- Memory Persistent: AI remembers important context across sessions
- Highly Configurable: 20+ keyboard shortcuts and customizable settings
- Resource Efficient: < 50MB RAM usage, minimal CPU impact
- Native macOS: SwiftUI-based with perfect system integration
Hush is a powerful macOS application that supercharges your productivity by providing intelligent audio transcription, screenshot analysis, and text processing capabilities. Whether you're transcribing meetings, analyzing screenshots, or processing text with AI, Hush delivers professional-grade results with consumer-friendly simplicity.
- Content Creators: Transcribe podcasts, videos, and audio content
- Developers: Analyze code screenshots and documentation
- Students: Convert lectures and study materials to text
- Professionals: Process meeting recordings and presentations
- Researchers: Analyze and process large amounts of content
- Writers: Generate ideas and process text with AI assistance
- Real-time transcription from microphone input
- Per resource system audio capture for transcribing computer audio
- Multiple audio sources with instant switching (⌘⌃L)
- Live transcript viewer for real-time monitoring
- High accuracy speech-to-text conversion
- Background processing without interrupting workflow
Note: Full system audio capture is currently under development. Current implementation supports per-resource audio capture.
- Instant screenshot capture with AI analysis
- Text extraction from images and documents
- Content understanding and context analysis
- Batch processing capabilities
- Smart cropping and image optimization
- Multiple format support (PNG, JPEG, PDF, etc.)
- Google Gemini integration for advanced AI processing
- Model selection between Gemini 2.5 Flash and 2.0 Flash
- Natural language understanding and generation
- Content summarization and analysis
- Smart suggestions and recommendations
- Context-aware responses based on input type
- Custom prompts system for personalized AI processing
- Memory persistence for preserving important context across sessions
- Rich markdown rendering with swift-markdown-ui for beautiful formatting
- Syntax highlighting powered by Highlightr with 170+ languages and customizable themes
- 20+ keyboard shortcuts for lightning-fast workflow
- Session management for organizing work
- Auto-scroll results for long content
- Window controls (opacity, always-on-top)
- Customizable interface with multiple themes
- Export options for sharing and saving results
- Local chat history: Save and manage your chat sessions locally
- Customizable settings: Control chat saving preferences and more
Feature | Performance | Details |
---|---|---|
Audio Transcription | < 100ms latency | Real-time processing with minimal delay |
Screenshot Analysis | < 2 seconds | Complete AI analysis including text extraction |
Text Processing | < 1 second | AI-powered analysis and generation |
App Launch | < 3 seconds | From click to ready-to-use |
Memory Usage | < 50MB | Efficient resource utilization |
CPU Impact | < 5% | Background processing without slowdown |
Input Type | Accuracy | Conditions |
---|---|---|
Clear Speech | 98%+ | Quiet environment, native speakers |
Normal Speech | 95%+ | Standard recording conditions |
Per Resource System Audio | 90%+ | Music and background noise filtered |
Screenshot Text | 99%+ | High-resolution, clear text |
Handwritten Text | 85%+ | Legible handwriting in good lighting |
- macOS: 14.4 (Monterey) or later
- Architecture: Apple Silicon (M1/M2/M3) or Intel x64
- RAM: 4GB minimum, 8GB recommended
- Storage: 100MB free space
- Internet: Required for AI processing
- Visit Hush Releases
- Download the latest
Hush.zip
file - Unzip and move
Hush.app
to your Applications folder - Launch Hush and grant necessary permissions
- Configure your Google Gemini API key in Settings
If you want to build Hush from source:
# Clone the repository
git clone https://github.com/KaizoKonpaku/Hush.git
# Open in Xcode
cd Hush
open Hush.xcodeproj
# Build and run (⌘R)
- Open the project in Xcode
- Select Product → Archive from the menu
- When the archive process completes, click Distribute App, then click on custom.
- Select Copy App and choose a location to save the app (e.g. Downloads)
- Drag the exported
Hush.app
to your Applications folder - Launch Hush from Applications and grant necessary permissions
- Configure your Google Gemini API key in Settings
# Get your free Gemini API key
open https://aistudio.google.com/app/apikey
- Open Hush Settings (⌘,)
- Navigate to API Configuration tab
- Enter your Gemini API key
- Select your preferred Gemini model (2.5 Flash or 2.0 Flash)
- Click Save to activate AI features
Hush requires microphone and per resource system audio permissions:
- System Preferences → Security & Privacy → Privacy
- Grant access to:
- Microphone (for voice transcription)
- Screen Recording (for per resource system audio capture)
- Accessibility (for advanced features)
- Auto-launch: Enable in Settings → General → Start at login
- Window behavior: Configure opacity and always-on-top
- Default audio source: Choose between microphone or per resource system audio
For quick access to Hush from anywhere on your system, you can create a global keyboard shortcut using macOS Automator:
- Open Automator → Create new Quick Action
- Set workflow to receive: "no input" in "any application"
- Add action: Search for "Launch Application" and drag it to the workflow
- Select Hush from the application dropdown
- Save the Quick Action (e.g., "Launch Hush")
- System Preferences → Keyboard → Shortcuts → Services
- Find your Quick Action and assign a keyboard shortcut (e.g., ⌘⌃H)
- Check for conflicts: Verify your chosen shortcut isn't already used by other apps or system functions
- Test thoroughly: Some shortcuts may fail if they conflict with browser extensions, other applications, or system shortcuts
- Recommended shortcuts: ⌘⌃H, ⌘⌃Space, ⌘⌃T (check availability first)
- Avoid common shortcuts: Don't use shortcuts already assigned to Mission Control, Spotlight, or frequently used apps
- If the shortcut doesn't work, check System Preferences → Keyboard → Shortcuts for conflicts
- Some applications (especially browsers) may override global shortcuts
- Try alternative key combinations if your first choice doesn't work consistently
- Consider using function keys (F1-F12) combined with modifiers for more reliable shortcuts
- Start recording: Press ⌘L or click the microphone button
- Switch sources: Use ⌘⌃L to toggle between mic/per resource system audio
- View live transcript: Enable with ⌘⇧L
- Stop recording: Press ⌘L again or click stop button
- Process with AI: Hit ⌘↩ to analyze transcript
- Capture screenshot: Press ⌘C or use the camera button
- Review capture: Preview appears in the interface
- Process with AI: Press ⌘↩ for analysis
- Delete if needed: Use ⌘D to remove screenshot
- Copy results: ⌘⇧C to copy processed text
- Enter text mode: Press ⌘T
- Type or paste content: Use the text input area
- Process with AI: Hit ⌘↩ for analysis
- Review results: View AI-generated response
- Copy or save: Use ⌘⇧C to copy results
- New session: ⌘N to start fresh workspace
- Session history: Access previous work sessions
- Export sessions: Save sessions for later reference
- Opacity control: Adjust transparency with slider
- Always on top: Keep Hush above other windows
- Window positioning: Use ⌘⇧↑/↓/←/→ to move window
- Reset position: ⌘R to center window
- Toggle auto-scroll: ⌘A to enable/disable
- Scroll manually: ⌘↑/↓ for manual scrolling
- Adjust speed: ⌘⇧+/- to change scroll speed
- Access prompts: Go to Settings (⌘,) → Prompts tab
- Add new prompt: Copy desired prompt text, then click "Paste as Prompt"
- Select prompt: Click "Select" next to any prompt to use it for AI processing
- Delete prompt: Click "Delete" to remove unwanted prompts
- How it works: Selected prompts are automatically combined with your input for enhanced AI analysis
Example Custom Prompts:
- "Analyze this content and provide 3 key insights"
- "Summarize this in simple terms for a beginner"
- "Review this code and suggest improvements"
- "Extract the main action items from this text"
- Access memories: Go to Settings (⌘,) → Memory tab
- Add memory: Copy text to clipboard, then click "Paste as Memory"
- Auto-naming: Memories are automatically named sequentially (Memory 1, Memory 2, etc.)
- Enable/disable: Toggle individual memory entries on/off
- View details: Click "Show" to expand and see full memory content
- Delete: Use the "Delete" button to remove memories
- How it works: Enabled memories are included in every AI interaction to provide persistent context
Example Memory Uses:
- Save project details the AI should always remember
- Store personal preferences for content generation
- Keep technical specifications for code assistance
- Maintain contextual information across different sessions
- Local chat history: Save and manage your chat sessions locally
- Customizable settings: Control chat saving preferences and more
Shortcut | Action | Description |
---|---|---|
⌘N | New Session | Create a fresh workspace |
⌘T | Text Mode | Switch to text input mode |
⌘C | Screenshot | Capture and analyze screenshot |
⌘L | Audio Recording | Start/stop audio transcription |
⌘↩ | Process | Send content to AI for analysis |
⌘, | Settings | Open settings window |
Shortcut | Action | Description |
---|---|---|
⌘L | Toggle Recording | Start/stop audio transcription |
⌘⇧L | Transcript Viewer | Show/hide live transcript |
⌘⌃L | Switch Audio Source | Toggle between mic/per resource system audio |
⌘⇧A | Per Resource System Audio | Direct per resource system audio recording |
Shortcut | Action | Description |
---|---|---|
⌘D | Delete Screenshot | Remove current screenshot |
⌘⇧C | Copy Results | Copy processed content |
⌘A | Auto-scroll | Toggle automatic scrolling |
⌘↑/↓ | Manual Scroll | Scroll through content |
Shortcut | Action | Description |
---|---|---|
⌘O | Toggle Opacity | Switch between opacity levels |
⌘R | Reset Position | Center window on screen |
⌘H | Show/Hide | Toggle app visibility |
⌘Q | Quit | Exit application |
⌘⇧↑/↓/←/→ | Move Window | Reposition window |
Shortcut | Action | Description |
---|---|---|
⌘⇧+ | Increase Speed | Faster auto-scroll |
⌘⇧- | Decrease Speed | Slower auto-scroll |
- SwiftUI: Modern, declarative UI framework
- Combine: Reactive programming for data flow
- Core Audio: Low-latency audio processing
- Vision Framework: OCR and image analysis
- Speech Framework: High-accuracy speech recognition
- AVFoundation: Audio/video capture and processing
- Google Gemini Pro: Advanced language model
- Streaming Responses: Real-time AI output
- Context Awareness: Intelligent prompt engineering
- Error Handling: Robust API failure management
- Rate Limiting: Efficient API usage
- Lazy Loading: UI components load on demand
- Background Processing: Non-blocking operations
- Memory Management: Automatic cleanup and optimization
- Caching System: Smart content caching
- Efficient Rendering: 60fps smooth animations
- Keychain Storage: Secure API key management
- Local Processing: Audio processing happens locally
- Privacy First: No data stored without consent
- Sandboxed: Full macOS app sandboxing
- Encrypted Transit: All API calls use HTTPS
- Native macOS: Follows Apple Human Interface Guidelines
- Minimal & Clean: Distraction-free interface
- Keyboard-First: Optimized for power users
- Window Opacity: 50% to 100% transparency
- Always on Top: Keep window above others
- Compact Mode: Minimal interface for focused work
- Dark/Light Mode: Automatic system appearance
- Live Indicators: Real-time status updates
- Progress Bars: Visual feedback for processing
- Smooth Animations: 60fps interface transitions
- Color Coding: Intuitive status and mode indicators
- Modern Icons: SF Symbols throughout interface
// Default audio configuration
audioSampleRate: 44100 Hz
audioChannels: Mono/Stereo auto-detection
bufferSize: 1024 samples
latency: < 100ms
// AI processing parameters
maxTokens: 4096
temperature: 0.7
topP: 0.9
timeoutSeconds: 30
// Memory and CPU limits
maxMemoryUsage: 128MB
maxCPUUsage: 25%
backgroundProcessing: true
- Initial Release: Complete feature set
- Audio Transcription: Real-time mic and per resource system audio
- Screenshot Processing: AI-powered image analysis
- AI Integration: Google Gemini support
- Custom Prompts: Paste-based prompt creation system for personalized AI processing
- Memory Persistence: Save and reuse important context across AI sessions
- Keyboard Shortcuts: 20+ productivity shortcuts
- Native Interface: SwiftUI-based macOS design
- Settings System: Comprehensive configuration options
Note: Full system audio capture is planned for a future release. Current version supports per-resource system audio capture.
This project is licensed under the MIT License - see the LICENSE file for details.
- Apple: For the amazing macOS platform and development tools
- Google: For the powerful Gemini AI API
- Cursor: For AI assstance with Claude and Gemini.
- insidegui: For Core Audio System Capture
- swift-markdown-ui: Rich markdown rendering and formatting
- Highlightr: Powerful syntax highlighting with 170+ languages and theme support
If you find Hush useful, please consider:
- Star this repository
- Follow me on X: @KaizoKonpakuu
- Share with others: Spread the word about Hush
- Report bugs: Help us improve the app
- Suggest features: Share your ideas for new capabilities
GitHub • X • Issues • Discussions