Transform YouTube and TikTok videos into custom AI-generated speech using Dia TTS voice synthesis technology.
This application takes a YouTube or TikTok video URL and creates a new audio file with the same content but spoken in a different AI-generated voice. Here's how it works:
- Download: Extracts audio from YouTube/TikTok videos
- Analyze: Processes the original audio for voice characteristics
- Synthesize: Generates new speech using Dia TTS voice cloning technology
- ✅ Multi-platform support: YouTube and TikTok videos
- ✅ Voice cloning: Creates custom voice models from source audio
- ✅ Web interface: Simple, responsive UI built with HTMX
- ✅ Real-time processing: See progress as your audio is generated
- ✅ Audio player: Listen to results directly in the browser
- ✅ Docker deployment: Ready for cloud deployment (Google Cloud Run)
-
yt-dlp - Video/audio downloader
# macOS brew install yt-dlp # Ubuntu/Debian sudo apt install yt-dlp # Or via pip pip install yt-dlp
-
TailwindCSS - For building styles
npm install -g tailwindcss
-
TEMPL - For templating in go
go install github.com/a-h/templ/cmd/templ@latest
Create a .env
file with:
PORT=8080
FAL_KEY=your_fal_ai_api_key_here
DATABASE_URL=your_postgres_connection_string
-
Clone the repository
git clone https://github.com/henrik392/youtube-voice-go.git cd youtube-voice-go
-
Install dependencies
go mod download
-
Build and run
make build make run
-
Open your browser Navigate to
http://localhost:8080
# Build the application (generates templates + CSS + binary)
make build
# Run the application
make run
# Start with live reload (installs air if needed)
make watch
# Run tests
make test
# Start PostgreSQL database container
make docker-run
# Stop database container
make docker-down
# Clean build artifacts
make clean
- Length: 30 seconds to 10 minutes
- Optimal: 1-5 minutes with clear audio
- Format: Supports any format that yt-dlp can process
cmd/
├── api/ # Main application entry point
└── web/ # Web handlers and templates
internal/
├── database/ # PostgreSQL integration
├── elevenlabs/ # Voice synthesis API client
├── server/ # HTTP server setup
└── youtube/ # Video processing logic
- Backend: Go with Chi router
- Frontend: HTML templates (templ) + HTMX + TailwindCSS
- Database: PostgreSQL
- Audio Processing: yt-dlp + ffmpeg
- AI Voice: Dia TTS (fal.ai) API
make docker-build
docker run -p 8080:8080 yt-voice
gcloud run deploy --image=europe-north1-docker.pkg.dev/youtube-to-voice/youtube-to-voice-repo/youtube-to-voice-image:tag1
- URL Validation: Checks if the provided URL is from YouTube or TikTok
- Audio Extraction: Downloads and converts video to MP3 (max 3 minutes)
- Reference Processing: Prepares the original audio as reference for voice cloning
- Text Processing: Formats the target text for Dia TTS processing
- Voice Synthesis: Uses Dia TTS to generate speech with the cloned voice in one step
- Delivery: Serves the final audio file through the web interface
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
make test
- Submit a pull request
This project is for educational and personal use. Please respect content creators' rights and fal.ai's terms of service.