Skip to content

Commit 26d3914

Browse files
authored
Create README for AudioMedia Checker
Added comprehensive README for AudioMedia Checker, detailing features, usage, command-line arguments, and installation instructions.
1 parent 4096ba2 commit 26d3914

File tree

1 file changed

+399
-0
lines changed

1 file changed

+399
-0
lines changed

β€ŽAudioMediaChecker/README.mdβ€Ž

Lines changed: 399 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,399 @@
1+
# AudioMedia Checker
2+
3+
![Docker Pulls](https://img.shields.io/docker/pulls/chryses/audiomedia-checker)
4+
![Docker Image Size](https://img.shields.io/docker/image-size/chryses/audiomedia-checker)
5+
![GitHub](https://img.shields.io/github/license/Jorman/Scripts)
6+
7+
> Automatic audio track language detection and tagging for video files using OpenAI Whisper
8+
9+
## πŸ“– Overview
10+
11+
AudioMedia Checker is a Docker-based CLI tool that automatically detects the language of audio tracks in video files and corrects language tags using OpenAI's Whisper AI model. It's designed as a disposable container (`docker run --rm`) that can be integrated into automation scripts without requiring any local installation.
12+
13+
The tool analyzes audio tracks without language tags (or with undefined tags) and updates MKV file metadata accordingly. For non-MKV formats, it performs read-only analysis in dry-run mode, ensuring safe operation.
14+
15+
## ✨ Features
16+
17+
- 🎯 **AI-Powered Detection** - Uses OpenAI Whisper for accurate language identification
18+
- 🏷️ **Automatic Tagging** - Updates language metadata in MKV files
19+
- πŸ“ **Flexible Analysis** - Single file or recursive folder processing
20+
- 🎚️ **Confidence Control** - Adjustable threshold (default: 65%)
21+
- πŸ”§ **Force Override** - Manual language assignment when detection fails
22+
- πŸš€ **GPU Acceleration** - Optional CUDA support for faster processing
23+
- πŸ’» **Docker-Native** - No local dependencies, run-and-forget design
24+
- πŸ§ͺ **Dry-Run Mode** - Safe testing without file modifications
25+
- πŸ“Š **Selective Analysis** - Process only untagged tracks or all tracks
26+
27+
## πŸš€ Quick Start
28+
29+
### Analyze a Single File (CPU)
30+
```bash
31+
docker run --rm \
32+
-v /path/to/movies:/data \
33+
chryses/audiomedia-checker:latest \
34+
--file "/data/Movie.mkv"
35+
```
36+
37+
### Analyze Folder Recursively (GPU)
38+
```bash
39+
docker run --rm --gpus all \
40+
-v /path/to/movies:/data \
41+
chryses/audiomedia-checker:latest \
42+
--gpu \
43+
--folder "/data" \
44+
--recursive
45+
```
46+
47+
### Dry-Run Test (Safe Mode)
48+
```bash
49+
docker run --rm \
50+
-v /path/to/movies:/data \
51+
chryses/audiomedia-checker:latest \
52+
--dry-run \
53+
--folder "/data/Movies" \
54+
--verbose
55+
```
56+
57+
## πŸ“‹ Command-Line Arguments
58+
59+
| Argument | Type | Default | Description |
60+
|----------|------|---------|-------------|
61+
| `--file` | string | - | Path to a single file to analyze |
62+
| `--folder` | string | - | Directory path to process |
63+
| `--recursive` | int | - | Depth levels (0 = unlimited, >0 = specific depth) |
64+
| `--check-all-tracks` | flag | false | Analyze all tracks, not just untagged ones |
65+
| `--verbose` | flag | false | Enable detailed logging |
66+
| `--dry-run` | flag | false | Simulate operations without modifying files |
67+
| `--force-language` | string | - | Force specific language (ISO 639-2, 3 letters) |
68+
| `--confidence` | int | 65 | Detection confidence threshold (0-100) |
69+
| `--model` | string | base | Whisper model size (see below) |
70+
| `--gpu` | flag | false | Use GPU acceleration (requires NVIDIA GPU) |
71+
| `--help-languages` | flag | false | Show available language codes |
72+
73+
### Whisper Models
74+
75+
| Model | Size | Speed | Accuracy | Recommended For |
76+
|-------|------|-------|----------|-----------------|
77+
| `tiny` | ~39 MB | ⚑⚑⚑ | ⭐⭐ | Quick tests |
78+
| `base` | ~74 MB | ⚑⚑ | ⭐⭐⭐ | **Default - Best balance** |
79+
| `small` | ~244 MB | ⚑ | ⭐⭐⭐⭐ | Better accuracy |
80+
| `medium` | ~769 MB | 🐌 | ⭐⭐⭐⭐⭐ | High accuracy needed |
81+
| `large` | ~1550 MB | 🐌🐌 | ⭐⭐⭐⭐⭐ | Maximum accuracy |
82+
| `large-v3` | ~1550 MB | 🐌🐌 | ⭐⭐⭐⭐⭐ | Latest version |
83+
84+
> **Tip:** `base` model provides excellent results for most use cases. Use larger models only if detection fails.
85+
86+
## πŸ’‘ Usage Examples
87+
88+
### Basic File Analysis
89+
```bash
90+
docker run --rm \
91+
-v /media/movies:/data \
92+
chryses/audiomedia-checker:latest \
93+
--file "/data/MyMovie.mkv" \
94+
--verbose
95+
```
96+
97+
### Recursive Folder with Custom Confidence
98+
```bash
99+
docker run --rm \
100+
-v /media/library:/library \
101+
chryses/audiomedia-checker:latest \
102+
--folder "/library" \
103+
--recursive 0 \
104+
--confidence 70 \
105+
--model small
106+
```
107+
108+
### Force Italian Language (Fallback)
109+
```bash
110+
docker run --rm \
111+
-v /media/movies:/data \
112+
chryses/audiomedia-checker:latest \
113+
--folder "/data/Italian_Films" \
114+
--force-language ita \
115+
--recursive
116+
```
117+
118+
> ⚠️ **Warning:** `--force-language` currently applies to ALL tracks with undefined tags or below confidence threshold. Use with caution!
119+
120+
### GPU-Accelerated Processing
121+
```bash
122+
docker run --rm --gpus all \
123+
-v /media/library:/data \
124+
chryses/audiomedia-checker:latest \
125+
--gpu \
126+
--folder "/data" \
127+
--recursive \
128+
--model medium
129+
```
130+
131+
### Dry-Run on Mixed Formats
132+
```bash
133+
docker run --rm \
134+
-v /media/downloads:/downloads \
135+
chryses/audiomedia-checker:latest \
136+
--dry-run \
137+
--folder "/downloads" \
138+
--check-all-tracks \
139+
--verbose
140+
```
141+
142+
## 🎯 How It Works
143+
144+
### Detection Logic
145+
146+
1. **Scans** MKV files (or all video formats in dry-run mode)
147+
2. **Identifies** audio tracks without language tags
148+
3. **Extracts** 30-second audio sample
149+
4. **Analyzes** with Whisper AI model
150+
5. **Updates** MKV metadata if confidence β‰₯ threshold
151+
6. **Skips** modification for non-MKV formats (analysis only)
152+
153+
### File Format Support
154+
155+
| Format | Detection | Tag Update | Notes |
156+
|--------|-----------|------------|-------|
157+
| `.mkv` | βœ… | βœ… | Fully supported |
158+
| `.mp4` | βœ… | ❌ | Dry-run only |
159+
| `.avi` | βœ… | ❌ | Dry-run only |
160+
| `.mov` | βœ… | ❌ | Dry-run only |
161+
| `.m4v` | βœ… | ❌ | Dry-run only |
162+
| `.flv` | βœ… | ❌ | Dry-run only |
163+
| `.wmv` | βœ… | ❌ | Dry-run only |
164+
| `.webm` | βœ… | ❌ | Dry-run only |
165+
166+
> **Safety:** Non-MKV files are automatically analyzed in read-only mode to prevent accidental modifications.
167+
168+
### Language Support
169+
170+
All languages supported by OpenAI Whisper:
171+
172+
- 🌍 **100+ languages** detected automatically
173+
- 🏷️ Tags use **ISO 639-2** format (3-letter codes)
174+
- πŸ“š Use `--help-languages` to see full list
175+
176+
Common examples: `eng` (English), `ita` (Italian), `fra` (French), `spa` (Spanish), `deu` (German), `jpn` (Japanese), `kor` (Korean), `rus` (Russian), `chi` (Chinese)
177+
178+
## πŸ–₯️ GPU Acceleration
179+
180+
### Requirements
181+
182+
- NVIDIA GPU with CUDA support
183+
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed
184+
- Docker `--gpus` flag support
185+
186+
### Installation (Ubuntu/Debian)
187+
188+
```bash
189+
# Install NVIDIA Container Toolkit
190+
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
191+
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
192+
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
193+
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
194+
195+
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
196+
sudo systemctl restart docker
197+
```
198+
199+
### Performance Comparison
200+
201+
| Model | CPU (i7) | GPU (RTX 3060) | Speedup |
202+
|-------|----------|----------------|---------|
203+
| `tiny` | ~5s | ~1s | 5x |
204+
| `base` | ~15s | ~3s | 5x |
205+
| `small` | ~45s | ~8s | 5.6x |
206+
| `medium` | ~2m | ~15s | 8x |
207+
| `large` | ~5m | ~30s | 10x |
208+
209+
> Times per file (approximate). GPU provides 5-10x faster processing.
210+
211+
## ⚠️ Important Notes
212+
213+
### Modifications & Backups
214+
215+
- βœ… **MKV files are modified in-place** (no backup created)
216+
- βœ… **Original video/audio streams untouched** (only metadata changes)
217+
- ⚠️ **No undo feature** - test with `--dry-run` first
218+
- πŸ’‘ **Recommendation:** Backup important files before first run
219+
220+
### Force Language Behavior
221+
222+
> ⚠️ **Current Limitation:** `--force-language` applies to ALL tracks that either:
223+
> - Have no language tag
224+
> - Have confidence score below threshold
225+
>
226+
> This may cause unexpected results. Use only when you're certain all tracks share the same language.
227+
228+
### Recursive Depth
229+
230+
```bash
231+
--recursive # Unlimited depth (all subdirectories)
232+
--recursive 0 # Same as above
233+
--recursive 1 # Only immediate subdirectories
234+
--recursive 2 # Up to 2 levels deep
235+
```
236+
237+
## πŸ”§ Integration Examples
238+
239+
### Automated Post-Processing Script
240+
241+
```bash
242+
#!/bin/bash
243+
# Process new downloads automatically
244+
245+
DOWNLOAD_DIR="/media/downloads"
246+
LIBRARY_DIR="/media/library"
247+
248+
# Analyze and tag
249+
docker run --rm \
250+
-v "$DOWNLOAD_DIR:/data" \
251+
chryses/audiomedia-checker:latest \
252+
--folder "/data" \
253+
--confidence 70 \
254+
--model base
255+
256+
# Move to library after tagging
257+
mv "$DOWNLOAD_DIR"/*.mkv "$LIBRARY_DIR/"
258+
```
259+
260+
### Cron Job (Daily Library Scan)
261+
262+
```bash
263+
# /etc/cron.daily/audiomedia-checker
264+
#!/bin/bash
265+
docker run --rm \
266+
-v /media/library:/library \
267+
chryses/audiomedia-checker:latest \
268+
--folder "/library" \
269+
--recursive \
270+
--confidence 75 \
271+
>> /var/log/audiomedia-checker.log 2>&1
272+
```
273+
274+
### Sonarr/Radarr Custom Script
275+
276+
```bash
277+
#!/bin/bash
278+
# Save as: /scripts/tag-audio.sh
279+
280+
FILE_PATH="$1" # Passed by Sonarr/Radarr
281+
282+
docker run --rm \
283+
-v "$(dirname "$FILE_PATH"):/data" \
284+
chryses/audiomedia-checker:latest \
285+
--file "/data/$(basename "$FILE_PATH")" \
286+
--model base
287+
```
288+
289+
## 🐳 Docker Hub
290+
291+
**Repository:** [chryses/audiomedia-checker](https://hub.docker.com/r/chryses/audiomedia-checker)
292+
293+
### Available Tags
294+
- `latest` - Latest stable release (recommended)
295+
- `[commit-sha]` - Specific commit builds for testing/rollback
296+
297+
### Supported Architectures
298+
- βœ… `linux/amd64` (x86_64)
299+
- βœ… `linux/arm64` (ARM 64-bit)
300+
301+
### Auto-Build
302+
Images are automatically built on every push to the `master` branch via GitHub Actions.
303+
304+
## πŸ› Troubleshooting
305+
306+
### "No module named 'whisper'"
307+
308+
The container includes all dependencies. If you see this error, you may be running an old image:
309+
310+
```bash
311+
docker pull chryses/audiomedia-checker:latest
312+
```
313+
314+
### GPU Not Detected
315+
316+
```bash
317+
# Test GPU availability
318+
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
319+
320+
# If fails, reinstall NVIDIA Container Toolkit
321+
sudo apt-get install -y nvidia-container-toolkit
322+
sudo systemctl restart docker
323+
```
324+
325+
### "Permission denied" on Files
326+
327+
Ensure your user has read/write access to mounted volumes:
328+
329+
```bash
330+
# Option 1: Run as your user
331+
docker run --rm --user $(id -u):$(id -g) \
332+
-v /media:/data \
333+
chryses/audiomedia-checker:latest ...
334+
335+
# Option 2: Fix permissions
336+
sudo chown -R $USER:$USER /media/library
337+
```
338+
339+
### Low Confidence Scores
340+
341+
If detection frequently fails:
342+
343+
1. Try a larger model: `--model medium`
344+
2. Lower threshold: `--confidence 50`
345+
3. Ensure audio is clear (not corrupted)
346+
4. Use `--force-language` as last resort
347+
348+
### High Memory Usage
349+
350+
Large models require significant RAM:
351+
352+
| Model | RAM Required |
353+
|-------|--------------|
354+
| tiny/base | ~2 GB |
355+
| small | ~4 GB |
356+
| medium | ~8 GB |
357+
| large | ~16 GB |
358+
359+
Use smaller models on limited hardware.
360+
361+
## 🀝 Contributing
362+
363+
Contributions are welcome! Please feel free to submit a Pull Request.
364+
365+
### How to Contribute
366+
1. Fork the repository
367+
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
368+
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
369+
4. Push to the branch (`git push origin feature/AmazingFeature`)
370+
5. Open a Pull Request
371+
372+
## πŸ“ž Support
373+
374+
- πŸ› **Bug Reports:** [GitHub Issues](https://github.com/Jorman/Scripts/issues)
375+
- πŸ’¬ **Discussions:** [GitHub Discussions](https://github.com/Jorman/Scripts/discussions)
376+
- 🐳 **Docker Hub:** [chryses/audiomedia-checker](https://hub.docker.com/r/chryses/audiomedia-checker)
377+
378+
## πŸ™ Acknowledgments
379+
380+
- **[OpenAI Whisper](https://github.com/openai/whisper)** - AI-powered speech recognition
381+
- **[MKVToolNix](https://mkvtoolnix.download/)** - MKV file manipulation
382+
- **[FFmpeg](https://ffmpeg.org/)** - Multimedia processing
383+
384+
## πŸ“„ License
385+
386+
This project is licensed under the **GNU General Public License v3.0** - see the [LICENSE](https://www.gnu.org/licenses/gpl-3.0.en.html) file for details.
387+
388+
## ⭐ Show Your Support
389+
390+
If you find this project useful, please consider:
391+
- ⭐ Starring the repository on GitHub
392+
- 🐳 Pulling the Docker image
393+
- πŸ“’ Sharing with the media automation community
394+
395+
---
396+
397+
**Made with ❀️ for audio perfectionists**
398+
399+
**Powered by OpenAI Whisper** | **Source:** [GitHub](https://github.com/Jorman/Scripts)

0 commit comments

Comments
Β (0)