🎙️ TranscriptAI - AI-Powered Audio Transcription

Free, Open-Source Speech-to-Text Converter - CLI & Web Interface

Transform your audio files into accurate text transcriptions using OpenAI's cutting-edge Whisper AI model. Available as both a powerful CLI tool for developers and a user-friendly web interface.

🚀 Choose Your Interface

🌐 Web Interface - Try Now

Perfect for quick transcriptions and non-technical users

No Installation Required - Use directly in your browser
Drag & Drop Interface - Simple and intuitive
Instant Results - Get transcriptions in seconds
Mobile Friendly - Works on any device

📋 Setting up the web interface? See DEPLOYMENT_GUIDE.md for GitHub Pages setup instructions.

💻 CLI Tool - For Developers & Power Users

Advanced features for batch processing and automation

Batch Processing - Handle hundreds of files
Advanced Configuration - Full control over parameters
Scriptable - Integrate into your workflows
All Features - Access to every Whisper model and option

✨ Why Choose TranscriptAI?

🎯 Built for Everyone - Web interface for ease, CLI for power
🤖 AI-Powered Accuracy - Uses OpenAI's state-of-the-art Whisper model
⚡ Lightning Fast - Process files in seconds, not minutes
🌍 Multilingual Support - Transcribe and translate 99+ languages
📦 Zero Configuration - Web version works instantly
💰 Completely Free - No API costs, subscriptions, or hidden fees

✨ Key Features

🎵 Universal Audio Support

Support for 8 major audio formats including MP3, WAV, M4A, AAC, FLAC, and more. No need to convert files - just drag and drop!

🤖 Multiple AI Models

Choose from 7 Whisper model sizes (39MB to 1.5GB) to balance speed vs accuracy for your specific needs.

🌍 99+ Languages

Automatic language detection or manually specify from 99+ supported languages. Built-in translation to English.

⚡ Batch Processing

Process hundreds of files simultaneously with intelligent error handling and progress tracking.

📝 Rich Output Formats

Get clean text files plus detailed JSON with timestamps, confidence scores, and metadata for advanced use cases.

🔧 Developer-Friendly

Python CLI tool with comprehensive logging, custom output directories, and configurable parameters.

Project Structure

transcript-ai/
├── cli/                          # CLI Tool (Python)
│   ├── src/
│   │   └── audio_transcriber.py  # Main CLI script
│   ├── input/                    # Sample audio files
│   ├── outputs/                  # Sample transcriptions
│   ├── requirements.txt          # Python dependencies
│   └── setup.py                  # CLI setup script
├── web/                          # Web Interface
│   ├── src/
│   │   ├── index.html            # Main web page
│   │   ├── styles.css            # Web styling
│   │   └── app.js                # Web functionality
│   └── dist/                     # Built files
├── docs/                         # GitHub Pages (auto-deployed)
├── shared/                       # Common resources
│   └── examples/                 # Usage examples
├── .github/
│   └── workflows/                # CI/CD automation
├── README.md                     # This file
└── LICENSE                       # MIT License

🚀 Quick Start

🌐 Web Interface (Easiest)

Visit ombharatiya.github.io/transcript-ai
Upload your audio file
Get instant transcription results
No installation required!

💻 CLI Installation

Prerequisites

Python 3.8 or higher
macOS, Linux, or Windows

Automatic Setup (Recommended)

# Clone the repository
git clone https://github.com/ombharatiya/transcript-ai.git
cd transcript-ai/cli

# Run the setup script (handles everything automatically)
python setup.py

The setup script will:

Create Python virtual environment
Install Python dependencies
Install ffmpeg (system-wide or locally)
Configure everything for immediate use

Manual Setup

If you prefer manual installation:

Navigate to CLI directory
```
cd transcript-ai/cli
```

Create and activate virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies
```
pip install -r requirements.txt
```

Install FFmpeg

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html and add to PATH

📝 Usage

🌐 Web Interface Usage

Visit the web app: ombharatiya.github.io/transcript-ai
Upload audio: Drag & drop or click to select your audio file
Choose options: Select AI model, language, and translation preferences
Get results: Download your transcription as text file

💻 CLI Usage

Basic Transcription

# Navigate to CLI directory and activate environment
cd transcript-ai/cli
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Transcribe a single file
python src/audio_transcriber.py input/audio.mp3

# Use a different model
python src/audio_transcriber.py input/audio.wav --model large

# Specify language
python src/audio_transcriber.py input/audio.aac --language en

Advanced Options

# Translate to English
python src/audio_transcriber.py input/foreign_audio.mp3 --task translate

# Adjust sampling temperature for creative/technical content
python src/audio_transcriber.py input/audio.wav --temperature 0.2

# Custom output directory
python src/audio_transcriber.py input/audio.mp3 --output-dir /custom/path

# Skip detailed JSON output
python src/audio_transcriber.py input/audio.mp3 --no-json

Batch Processing

# Process multiple files
python src/audio_transcriber.py input/*.mp3 --batch

# Process specific files
python src/audio_transcriber.py input/file1.wav input/file2.aac input/file3.mp3

# Batch with custom settings
python src/audio_transcriber.py input/*.wav --batch --model medium --language es

File Information

# Check audio file details
python src/audio_transcriber.py input/audio.mp3 --info

Whisper Models

Model	Size	Speed	Quality	Use Case
tiny	39 MB	Fastest	Basic	Quick tests, low-resource
base	74 MB	Fast	Good	General use (default)
small	244 MB	Moderate	Better	Good balance
medium	769 MB	Slower	High	Professional transcription
large	1550 MB	Slowest	Best	Highest accuracy needed
large-v2	1550 MB	Slowest	Best	Latest improvements
large-v3	1550 MB	Slowest	Best	Most recent version

Supported Audio Formats

AAC (.aac) - Advanced Audio Coding
FLAC (.flac) - Free Lossless Audio Codec
MP3 (.mp3) - MPEG Audio Layer III
MP4 (.mp4) - MPEG-4 Audio
M4A (.m4a) - MPEG-4 Audio
OGG (.ogg) - Ogg Vorbis
WAV (.wav) - Waveform Audio File Format
WebM (.webm) - WebM Audio

Output Files

Text Output

Filename: {original_name}_transcription.txt
Content: Clean transcribed text

JSON Output (Detailed)

Filename: {original_name}_detailed.json
Content:
- Full transcription text
- Word-level timestamps
- Segment information
- File metadata
- Model and processing details

Example JSON Structure

{
  "text": "Your transcribed text here...",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 5.0,
      "text": " Your transcribed text here...",
      "tokens": [50364, 2396, ...],
      "temperature": 0.0,
      "avg_logprob": -0.5,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.1
    }
  ],
  "language": "en",
  "metadata": {
    "file_info": {
      "filename": "audio.mp3",
      "size_mb": 5.2,
      "format": ".mp3",
      "supported": true
    },
    "model_used": "base",
    "transcription_time": "0:00:30.123456",
    "timestamp": "2024-01-01T12:00:00.000000"
  }
}

Configuration

Edit config/default_config.json to customize default settings:

{
  "model_settings": {
    "default_model": "base",
    "default_language": null,
    "default_task": "transcribe",
    "default_temperature": 0.0
  },
  "output_settings": {
    "output_directory": "outputs",
    "save_json_details": true,
    "log_directory": "logs"
  }
}

Examples

Personal Voice Notes

python src/audio_transcriber.py input/voice_memo.m4a

Meeting Recordings

python src/audio_transcriber.py input/meeting.wav --model medium --output-dir meetings/

Podcast Episodes

python src/audio_transcriber.py input/podcast_*.mp3 --batch --model large

Foreign Language Content

python src/audio_transcriber.py input/spanish_audio.mp3 --language es --task translate

Troubleshooting

Common Issues

1. FFmpeg not found

# Install ffmpeg using the setup script:
python setup.py

# Or install manually:
# macOS
brew install ffmpeg

# Ubuntu/Debian  
sudo apt install ffmpeg

# Windows: Download from https://ffmpeg.org/download.html

2. Out of memory errors

# Use a smaller model
python src/audio_transcriber.py large_file.mp3 --model tiny

3. Slow transcription

# Use GPU acceleration if available (requires CUDA setup)
# Or use smaller model for faster processing
python src/audio_transcriber.py file.mp3 --model small

4. Unsupported file format

# Check supported formats
python src/audio_transcriber.py input/file.xyz --info

# Convert using ffmpeg
ffmpeg -i input/file.xyz -c:a aac input/output.aac

Performance Tips

Choose appropriate model size based on your needs
Use batch processing for multiple files
Specify language when known (saves detection time)
Use SSD storage for better I/O performance
Close other applications for memory-intensive models

Logging

Logs are automatically saved to logs/transcription_YYYYMMDD.log and include:

Processing start/end times
File information
Error messages
Model loading status
Transcription progress

Requirements

Python 3.8+
~2GB RAM (for base model)
~8GB RAM (for large models)
FFmpeg (auto-configured)
Internet connection (first run to download models)

License

This project uses OpenAI's Whisper model. Please refer to Whisper's license for usage terms.

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

🏷️ Use Cases & Industries

👔 Business & Enterprise

Meeting transcriptions - Convert recorded meetings to searchable text
Interview documentation - Professional recruitment and research interviews
Customer support - Analyze call recordings for quality assurance

🎓 Education & Research

Lecture notes - Transform recorded lectures into study materials
Research interviews - Academic and qualitative research transcription
Language learning - Practice pronunciation with AI feedback

🎥 Content Creation

Podcast transcriptions - Create show notes and SEO-friendly content
Video subtitles - Generate captions for YouTube and social media
Voice memo organization - Convert ideas into searchable text

♿ Accessibility

Hearing accessibility - Make audio content accessible to deaf community
Voice-to-text tools - Assistive technology for speech disabilities

🔍 SEO Keywords

AI transcription, OpenAI Whisper, speech to text, audio converter, voice recognition, Python CLI tool, batch audio processing, multilingual transcription, free transcription software, developer tools

📊 Repository Stats

Acknowledgments

OpenAI Whisper for the transcription model
FFmpeg for audio processing

Transform Voice to Text with AI - Start Transcribing Today! 🎙️ ➜ 📝

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
cli		cli
docs		docs
shared/examples		shared/examples
web		web
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
KEYWORDS.md		KEYWORDS.md
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md

License

ombharatiya/transcript-ai

Folders and files

Latest commit

History

Repository files navigation

🎙️ TranscriptAI - AI-Powered Audio Transcription

🚀 Choose Your Interface

🌐 Web Interface - Try Now

💻 CLI Tool - For Developers & Power Users

✨ Why Choose TranscriptAI?

✨ Key Features

🎵 Universal Audio Support

🤖 Multiple AI Models

🌍 99+ Languages

⚡ Batch Processing

📝 Rich Output Formats

🔧 Developer-Friendly

Project Structure

🚀 Quick Start

🌐 Web Interface (Easiest)

💻 CLI Installation

Prerequisites

Automatic Setup (Recommended)

Manual Setup

📝 Usage

🌐 Web Interface Usage

💻 CLI Usage

Basic Transcription

Advanced Options

Batch Processing

File Information

Whisper Models

Supported Audio Formats

Output Files

Text Output

JSON Output (Detailed)

Example JSON Structure

Configuration

Examples

Personal Voice Notes

Meeting Recordings

Podcast Episodes

Foreign Language Content

Troubleshooting

Common Issues

Performance Tips

Logging

Requirements

License

Contributing

🏷️ Use Cases & Industries

👔 Business & Enterprise

🎓 Education & Research

🎥 Content Creation

♿ Accessibility

🔍 SEO Keywords

📊 Repository Stats

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages