YUTRAS is a modern, AI-powered speech-to-speech translation system that allows you to automatically download audio from YouTube videos and translate the spoken content to another language. It leverages Facebook's Seamless M4T model for high-quality speech-to-speech translation.
- YouTube Audio Extraction: Download audio tracks from any YouTube video
- Speech-to-Speech Translation: Translate spoken content between languages using state-of-the-art AI
- Smart Chunking: Process long audio files in chunks to prevent memory issues
- GPU Acceleration: CUDA support for faster processing
- Robust Error Handling: Automatic retries and fallbacks to ensure successful translation
- User-friendly CLI: Simple command-line interface with helpful options
- Python 3.8 or higher
- FFmpeg (required for audio processing)
- NVIDIA GPU with CUDA support (optional, for faster processing)
# Simply run the installation script
python install.py
This script will:
- Install the package in development mode
- Check all required dependencies
- Verify FFmpeg installation
- Provide clear instructions for any missing requirements
# Clone the repository
git clone https://github.com/yourusername/yutras.git
cd yutras
# Create and activate a virtual environment (recommended)
python -m venv venv
# On Windows:
venv\Scripts\activate
# On Linux/Mac:
source venv/bin/activate
# Install the package
pip install -e .
# Create and activate a virtual environment (recommended)
python -m venv venv
# On Windows:
venv\Scripts\activate
# On Linux/Mac:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
YUTRAS requires FFmpeg for audio processing. If you don't have it installed:
Windows:
- Download FFmpeg from ffmpeg.org (e.g., from gyan.dev)
- Extract the downloaded archive (e.g., to
C:\ffmpeg
) - Add the
bin
directory (e.g.,C:\ffmpeg\bin
) to your system's PATH environment variable - Restart your terminal/command prompt
Linux:
sudo apt update
sudo apt install ffmpeg
macOS:
brew install ffmpeg
# Using the installed package
yutras "https://youtu.be/example"
# Or using the script directly
python translate.py "https://youtu.be/example"
To test just the translation functionality with a local audio file:
# Basic usage
python test_translation.py input_audio.wav output_audio.wav
# With options
python test_translation.py input_audio.wav output_audio.wav --lang fra --cpu --chunk-size 5
This is useful for debugging translation issues or working with local audio files.
# Translate to a different language
yutras "https://youtu.be/example" --target_lang fra # French
# Force CPU usage (more reliable but slower)
yutras "https://youtu.be/example" --cpu
# Specify output directory
yutras "https://youtu.be/example" -o my_translations
# Only download audio, skip translation
yutras "https://youtu.be/example" --skip_translation
# Adjust chunk size for processing long videos
yutras "https://youtu.be/example" --chunk_size 10
To check if your system supports CUDA acceleration:
python check_cuda.py
YUTRAS supports all languages available in the Seamless M4T model. Some common language codes:
eng
: Englishdeu
: Germanfra
: Frenchspa
: Spanishrus
: Russiancmn
: Mandarin Chinesejpn
: Japanese
For a complete list, refer to the Seamless M4T documentation.
yutras/
βββ yutras/
β βββ __init__.py # Package initialization
β βββ cli.py # Command-line interface
β βββ config.py # Configuration management
β βββ core/ # Core functionality
β β βββ __init__.py
β β βββ pipeline.py # Translation pipeline
β βββ models/ # Model management
β β βββ __init__.py
β β βββ seamless_m4t.py # SeamlessM4T model wrapper
β βββ utils/ # Utility functions
β βββ __init__.py
β βββ audio.py # Audio processing utilities
β βββ download.py # YouTube download utilities
β βββ system.py # System utilities
βββ translate.py # Main script for YouTube to translated audio
βββ test_translation.py # Test script for direct audio translation
βββ install.py # Installation and environment validation
βββ check_cuda.py # CUDA availability checker
βββ setup.py # Package setup script
βββ requirements.txt # Package dependencies
βββ README.md # This documentation
If translation is failing:
- Run the installation script first:
python install.py
- Test with a local audio file:
python test_translation.py input.wav output.wav --cpu
- Check logs for specific error messages
- Make sure your input audio file actually contains speech
If you see errors during model loading:
- Ensure you have a stable internet connection (model is downloaded from Hugging Face)
- Try using
--cpu
flag to rule out GPU-related issues - Consider clearing Hugging Face cache:
rm -rf ~/.cache/huggingface/hub
(Linux/Mac) or delete the folder on Windows
- Try using smaller chunks with
--chunk_size 5
- Use the
--cpu
flag to process on CPU (slower but more reliable) - Close other GPU-intensive applications
- Ensure FFmpeg is properly installed and in your PATH
- Check your internet connection
- Verify the YouTube URL is valid and accessible
- Try downloading the audio manually with yt-dlp first
- Enable GPU acceleration by installing CUDA (see "Installation")
- If using CPU, be patient as translation can take significant time
- Reduce chunk size for more reliable (but potentially slower) processing
- Ensure your audio input is in a standard format (WAV is safest)
- For best results, use mono audio at 16kHz
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Facebook AI Research for the Seamless M4T model
- Hugging Face for the Transformers library
- yt-dlp for YouTube downloading capabilities