|
| 1 | +# AudioMedia Checker |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +> Automatic audio track language detection and tagging for video files using OpenAI Whisper |
| 8 | +
|
| 9 | +## π Overview |
| 10 | + |
| 11 | +AudioMedia Checker is a Docker-based CLI tool that automatically detects the language of audio tracks in video files and corrects language tags using OpenAI's Whisper AI model. It's designed as a disposable container (`docker run --rm`) that can be integrated into automation scripts without requiring any local installation. |
| 12 | + |
| 13 | +The tool analyzes audio tracks without language tags (or with undefined tags) and updates MKV file metadata accordingly. For non-MKV formats, it performs read-only analysis in dry-run mode, ensuring safe operation. |
| 14 | + |
| 15 | +## β¨ Features |
| 16 | + |
| 17 | +- π― **AI-Powered Detection** - Uses OpenAI Whisper for accurate language identification |
| 18 | +- π·οΈ **Automatic Tagging** - Updates language metadata in MKV files |
| 19 | +- π **Flexible Analysis** - Single file or recursive folder processing |
| 20 | +- ποΈ **Confidence Control** - Adjustable threshold (default: 65%) |
| 21 | +- π§ **Force Override** - Manual language assignment when detection fails |
| 22 | +- π **GPU Acceleration** - Optional CUDA support for faster processing |
| 23 | +- π» **Docker-Native** - No local dependencies, run-and-forget design |
| 24 | +- π§ͺ **Dry-Run Mode** - Safe testing without file modifications |
| 25 | +- π **Selective Analysis** - Process only untagged tracks or all tracks |
| 26 | + |
| 27 | +## π Quick Start |
| 28 | + |
| 29 | +### Analyze a Single File (CPU) |
| 30 | +```bash |
| 31 | +docker run --rm \ |
| 32 | + -v /path/to/movies:/data \ |
| 33 | + chryses/audiomedia-checker:latest \ |
| 34 | + --file "/data/Movie.mkv" |
| 35 | +``` |
| 36 | + |
| 37 | +### Analyze Folder Recursively (GPU) |
| 38 | +```bash |
| 39 | +docker run --rm --gpus all \ |
| 40 | + -v /path/to/movies:/data \ |
| 41 | + chryses/audiomedia-checker:latest \ |
| 42 | + --gpu \ |
| 43 | + --folder "/data" \ |
| 44 | + --recursive |
| 45 | +``` |
| 46 | + |
| 47 | +### Dry-Run Test (Safe Mode) |
| 48 | +```bash |
| 49 | +docker run --rm \ |
| 50 | + -v /path/to/movies:/data \ |
| 51 | + chryses/audiomedia-checker:latest \ |
| 52 | + --dry-run \ |
| 53 | + --folder "/data/Movies" \ |
| 54 | + --verbose |
| 55 | +``` |
| 56 | + |
| 57 | +## π Command-Line Arguments |
| 58 | + |
| 59 | +| Argument | Type | Default | Description | |
| 60 | +|----------|------|---------|-------------| |
| 61 | +| `--file` | string | - | Path to a single file to analyze | |
| 62 | +| `--folder` | string | - | Directory path to process | |
| 63 | +| `--recursive` | int | - | Depth levels (0 = unlimited, >0 = specific depth) | |
| 64 | +| `--check-all-tracks` | flag | false | Analyze all tracks, not just untagged ones | |
| 65 | +| `--verbose` | flag | false | Enable detailed logging | |
| 66 | +| `--dry-run` | flag | false | Simulate operations without modifying files | |
| 67 | +| `--force-language` | string | - | Force specific language (ISO 639-2, 3 letters) | |
| 68 | +| `--confidence` | int | 65 | Detection confidence threshold (0-100) | |
| 69 | +| `--model` | string | base | Whisper model size (see below) | |
| 70 | +| `--gpu` | flag | false | Use GPU acceleration (requires NVIDIA GPU) | |
| 71 | +| `--help-languages` | flag | false | Show available language codes | |
| 72 | + |
| 73 | +### Whisper Models |
| 74 | + |
| 75 | +| Model | Size | Speed | Accuracy | Recommended For | |
| 76 | +|-------|------|-------|----------|-----------------| |
| 77 | +| `tiny` | ~39 MB | β‘β‘β‘ | ββ | Quick tests | |
| 78 | +| `base` | ~74 MB | β‘β‘ | βββ | **Default - Best balance** | |
| 79 | +| `small` | ~244 MB | β‘ | ββββ | Better accuracy | |
| 80 | +| `medium` | ~769 MB | π | βββββ | High accuracy needed | |
| 81 | +| `large` | ~1550 MB | ππ | βββββ | Maximum accuracy | |
| 82 | +| `large-v3` | ~1550 MB | ππ | βββββ | Latest version | |
| 83 | + |
| 84 | +> **Tip:** `base` model provides excellent results for most use cases. Use larger models only if detection fails. |
| 85 | +
|
| 86 | +## π‘ Usage Examples |
| 87 | + |
| 88 | +### Basic File Analysis |
| 89 | +```bash |
| 90 | +docker run --rm \ |
| 91 | + -v /media/movies:/data \ |
| 92 | + chryses/audiomedia-checker:latest \ |
| 93 | + --file "/data/MyMovie.mkv" \ |
| 94 | + --verbose |
| 95 | +``` |
| 96 | + |
| 97 | +### Recursive Folder with Custom Confidence |
| 98 | +```bash |
| 99 | +docker run --rm \ |
| 100 | + -v /media/library:/library \ |
| 101 | + chryses/audiomedia-checker:latest \ |
| 102 | + --folder "/library" \ |
| 103 | + --recursive 0 \ |
| 104 | + --confidence 70 \ |
| 105 | + --model small |
| 106 | +``` |
| 107 | + |
| 108 | +### Force Italian Language (Fallback) |
| 109 | +```bash |
| 110 | +docker run --rm \ |
| 111 | + -v /media/movies:/data \ |
| 112 | + chryses/audiomedia-checker:latest \ |
| 113 | + --folder "/data/Italian_Films" \ |
| 114 | + --force-language ita \ |
| 115 | + --recursive |
| 116 | +``` |
| 117 | + |
| 118 | +> β οΈ **Warning:** `--force-language` currently applies to ALL tracks with undefined tags or below confidence threshold. Use with caution! |
| 119 | +
|
| 120 | +### GPU-Accelerated Processing |
| 121 | +```bash |
| 122 | +docker run --rm --gpus all \ |
| 123 | + -v /media/library:/data \ |
| 124 | + chryses/audiomedia-checker:latest \ |
| 125 | + --gpu \ |
| 126 | + --folder "/data" \ |
| 127 | + --recursive \ |
| 128 | + --model medium |
| 129 | +``` |
| 130 | + |
| 131 | +### Dry-Run on Mixed Formats |
| 132 | +```bash |
| 133 | +docker run --rm \ |
| 134 | + -v /media/downloads:/downloads \ |
| 135 | + chryses/audiomedia-checker:latest \ |
| 136 | + --dry-run \ |
| 137 | + --folder "/downloads" \ |
| 138 | + --check-all-tracks \ |
| 139 | + --verbose |
| 140 | +``` |
| 141 | + |
| 142 | +## π― How It Works |
| 143 | + |
| 144 | +### Detection Logic |
| 145 | + |
| 146 | +1. **Scans** MKV files (or all video formats in dry-run mode) |
| 147 | +2. **Identifies** audio tracks without language tags |
| 148 | +3. **Extracts** 30-second audio sample |
| 149 | +4. **Analyzes** with Whisper AI model |
| 150 | +5. **Updates** MKV metadata if confidence β₯ threshold |
| 151 | +6. **Skips** modification for non-MKV formats (analysis only) |
| 152 | + |
| 153 | +### File Format Support |
| 154 | + |
| 155 | +| Format | Detection | Tag Update | Notes | |
| 156 | +|--------|-----------|------------|-------| |
| 157 | +| `.mkv` | β
| β
| Fully supported | |
| 158 | +| `.mp4` | β
| β | Dry-run only | |
| 159 | +| `.avi` | β
| β | Dry-run only | |
| 160 | +| `.mov` | β
| β | Dry-run only | |
| 161 | +| `.m4v` | β
| β | Dry-run only | |
| 162 | +| `.flv` | β
| β | Dry-run only | |
| 163 | +| `.wmv` | β
| β | Dry-run only | |
| 164 | +| `.webm` | β
| β | Dry-run only | |
| 165 | + |
| 166 | +> **Safety:** Non-MKV files are automatically analyzed in read-only mode to prevent accidental modifications. |
| 167 | +
|
| 168 | +### Language Support |
| 169 | + |
| 170 | +All languages supported by OpenAI Whisper: |
| 171 | + |
| 172 | +- π **100+ languages** detected automatically |
| 173 | +- π·οΈ Tags use **ISO 639-2** format (3-letter codes) |
| 174 | +- π Use `--help-languages` to see full list |
| 175 | + |
| 176 | +Common examples: `eng` (English), `ita` (Italian), `fra` (French), `spa` (Spanish), `deu` (German), `jpn` (Japanese), `kor` (Korean), `rus` (Russian), `chi` (Chinese) |
| 177 | + |
| 178 | +## π₯οΈ GPU Acceleration |
| 179 | + |
| 180 | +### Requirements |
| 181 | + |
| 182 | +- NVIDIA GPU with CUDA support |
| 183 | +- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed |
| 184 | +- Docker `--gpus` flag support |
| 185 | + |
| 186 | +### Installation (Ubuntu/Debian) |
| 187 | + |
| 188 | +```bash |
| 189 | +# Install NVIDIA Container Toolkit |
| 190 | +distribution=$(. /etc/os-release;echo $ID$VERSION_ID) |
| 191 | +curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - |
| 192 | +curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ |
| 193 | + sudo tee /etc/apt/sources.list.d/nvidia-docker.list |
| 194 | + |
| 195 | +sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit |
| 196 | +sudo systemctl restart docker |
| 197 | +``` |
| 198 | + |
| 199 | +### Performance Comparison |
| 200 | + |
| 201 | +| Model | CPU (i7) | GPU (RTX 3060) | Speedup | |
| 202 | +|-------|----------|----------------|---------| |
| 203 | +| `tiny` | ~5s | ~1s | 5x | |
| 204 | +| `base` | ~15s | ~3s | 5x | |
| 205 | +| `small` | ~45s | ~8s | 5.6x | |
| 206 | +| `medium` | ~2m | ~15s | 8x | |
| 207 | +| `large` | ~5m | ~30s | 10x | |
| 208 | + |
| 209 | +> Times per file (approximate). GPU provides 5-10x faster processing. |
| 210 | +
|
| 211 | +## β οΈ Important Notes |
| 212 | + |
| 213 | +### Modifications & Backups |
| 214 | + |
| 215 | +- β
**MKV files are modified in-place** (no backup created) |
| 216 | +- β
**Original video/audio streams untouched** (only metadata changes) |
| 217 | +- β οΈ **No undo feature** - test with `--dry-run` first |
| 218 | +- π‘ **Recommendation:** Backup important files before first run |
| 219 | + |
| 220 | +### Force Language Behavior |
| 221 | + |
| 222 | +> β οΈ **Current Limitation:** `--force-language` applies to ALL tracks that either: |
| 223 | +> - Have no language tag |
| 224 | +> - Have confidence score below threshold |
| 225 | +> |
| 226 | +> This may cause unexpected results. Use only when you're certain all tracks share the same language. |
| 227 | +
|
| 228 | +### Recursive Depth |
| 229 | + |
| 230 | +```bash |
| 231 | +--recursive # Unlimited depth (all subdirectories) |
| 232 | +--recursive 0 # Same as above |
| 233 | +--recursive 1 # Only immediate subdirectories |
| 234 | +--recursive 2 # Up to 2 levels deep |
| 235 | +``` |
| 236 | + |
| 237 | +## π§ Integration Examples |
| 238 | + |
| 239 | +### Automated Post-Processing Script |
| 240 | + |
| 241 | +```bash |
| 242 | +#!/bin/bash |
| 243 | +# Process new downloads automatically |
| 244 | + |
| 245 | +DOWNLOAD_DIR="/media/downloads" |
| 246 | +LIBRARY_DIR="/media/library" |
| 247 | + |
| 248 | +# Analyze and tag |
| 249 | +docker run --rm \ |
| 250 | + -v "$DOWNLOAD_DIR:/data" \ |
| 251 | + chryses/audiomedia-checker:latest \ |
| 252 | + --folder "/data" \ |
| 253 | + --confidence 70 \ |
| 254 | + --model base |
| 255 | + |
| 256 | +# Move to library after tagging |
| 257 | +mv "$DOWNLOAD_DIR"/*.mkv "$LIBRARY_DIR/" |
| 258 | +``` |
| 259 | + |
| 260 | +### Cron Job (Daily Library Scan) |
| 261 | + |
| 262 | +```bash |
| 263 | +# /etc/cron.daily/audiomedia-checker |
| 264 | +#!/bin/bash |
| 265 | +docker run --rm \ |
| 266 | + -v /media/library:/library \ |
| 267 | + chryses/audiomedia-checker:latest \ |
| 268 | + --folder "/library" \ |
| 269 | + --recursive \ |
| 270 | + --confidence 75 \ |
| 271 | + >> /var/log/audiomedia-checker.log 2>&1 |
| 272 | +``` |
| 273 | + |
| 274 | +### Sonarr/Radarr Custom Script |
| 275 | + |
| 276 | +```bash |
| 277 | +#!/bin/bash |
| 278 | +# Save as: /scripts/tag-audio.sh |
| 279 | + |
| 280 | +FILE_PATH="$1" # Passed by Sonarr/Radarr |
| 281 | + |
| 282 | +docker run --rm \ |
| 283 | + -v "$(dirname "$FILE_PATH"):/data" \ |
| 284 | + chryses/audiomedia-checker:latest \ |
| 285 | + --file "/data/$(basename "$FILE_PATH")" \ |
| 286 | + --model base |
| 287 | +``` |
| 288 | + |
| 289 | +## π³ Docker Hub |
| 290 | + |
| 291 | +**Repository:** [chryses/audiomedia-checker](https://hub.docker.com/r/chryses/audiomedia-checker) |
| 292 | + |
| 293 | +### Available Tags |
| 294 | +- `latest` - Latest stable release (recommended) |
| 295 | +- `[commit-sha]` - Specific commit builds for testing/rollback |
| 296 | + |
| 297 | +### Supported Architectures |
| 298 | +- β
`linux/amd64` (x86_64) |
| 299 | +- β
`linux/arm64` (ARM 64-bit) |
| 300 | + |
| 301 | +### Auto-Build |
| 302 | +Images are automatically built on every push to the `master` branch via GitHub Actions. |
| 303 | + |
| 304 | +## π Troubleshooting |
| 305 | + |
| 306 | +### "No module named 'whisper'" |
| 307 | + |
| 308 | +The container includes all dependencies. If you see this error, you may be running an old image: |
| 309 | + |
| 310 | +```bash |
| 311 | +docker pull chryses/audiomedia-checker:latest |
| 312 | +``` |
| 313 | + |
| 314 | +### GPU Not Detected |
| 315 | + |
| 316 | +```bash |
| 317 | +# Test GPU availability |
| 318 | +docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi |
| 319 | + |
| 320 | +# If fails, reinstall NVIDIA Container Toolkit |
| 321 | +sudo apt-get install -y nvidia-container-toolkit |
| 322 | +sudo systemctl restart docker |
| 323 | +``` |
| 324 | + |
| 325 | +### "Permission denied" on Files |
| 326 | + |
| 327 | +Ensure your user has read/write access to mounted volumes: |
| 328 | + |
| 329 | +```bash |
| 330 | +# Option 1: Run as your user |
| 331 | +docker run --rm --user $(id -u):$(id -g) \ |
| 332 | + -v /media:/data \ |
| 333 | + chryses/audiomedia-checker:latest ... |
| 334 | + |
| 335 | +# Option 2: Fix permissions |
| 336 | +sudo chown -R $USER:$USER /media/library |
| 337 | +``` |
| 338 | + |
| 339 | +### Low Confidence Scores |
| 340 | + |
| 341 | +If detection frequently fails: |
| 342 | + |
| 343 | +1. Try a larger model: `--model medium` |
| 344 | +2. Lower threshold: `--confidence 50` |
| 345 | +3. Ensure audio is clear (not corrupted) |
| 346 | +4. Use `--force-language` as last resort |
| 347 | + |
| 348 | +### High Memory Usage |
| 349 | + |
| 350 | +Large models require significant RAM: |
| 351 | + |
| 352 | +| Model | RAM Required | |
| 353 | +|-------|--------------| |
| 354 | +| tiny/base | ~2 GB | |
| 355 | +| small | ~4 GB | |
| 356 | +| medium | ~8 GB | |
| 357 | +| large | ~16 GB | |
| 358 | + |
| 359 | +Use smaller models on limited hardware. |
| 360 | + |
| 361 | +## π€ Contributing |
| 362 | + |
| 363 | +Contributions are welcome! Please feel free to submit a Pull Request. |
| 364 | + |
| 365 | +### How to Contribute |
| 366 | +1. Fork the repository |
| 367 | +2. Create a feature branch (`git checkout -b feature/AmazingFeature`) |
| 368 | +3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) |
| 369 | +4. Push to the branch (`git push origin feature/AmazingFeature`) |
| 370 | +5. Open a Pull Request |
| 371 | + |
| 372 | +## π Support |
| 373 | + |
| 374 | +- π **Bug Reports:** [GitHub Issues](https://github.com/Jorman/Scripts/issues) |
| 375 | +- π¬ **Discussions:** [GitHub Discussions](https://github.com/Jorman/Scripts/discussions) |
| 376 | +- π³ **Docker Hub:** [chryses/audiomedia-checker](https://hub.docker.com/r/chryses/audiomedia-checker) |
| 377 | + |
| 378 | +## π Acknowledgments |
| 379 | + |
| 380 | +- **[OpenAI Whisper](https://github.com/openai/whisper)** - AI-powered speech recognition |
| 381 | +- **[MKVToolNix](https://mkvtoolnix.download/)** - MKV file manipulation |
| 382 | +- **[FFmpeg](https://ffmpeg.org/)** - Multimedia processing |
| 383 | + |
| 384 | +## π License |
| 385 | + |
| 386 | +This project is licensed under the **GNU General Public License v3.0** - see the [LICENSE](https://www.gnu.org/licenses/gpl-3.0.en.html) file for details. |
| 387 | + |
| 388 | +## β Show Your Support |
| 389 | + |
| 390 | +If you find this project useful, please consider: |
| 391 | +- β Starring the repository on GitHub |
| 392 | +- π³ Pulling the Docker image |
| 393 | +- π’ Sharing with the media automation community |
| 394 | + |
| 395 | +--- |
| 396 | + |
| 397 | +**Made with β€οΈ for audio perfectionists** |
| 398 | + |
| 399 | +**Powered by OpenAI Whisper** | **Source:** [GitHub](https://github.com/Jorman/Scripts) |
0 commit comments