AI-powered video subtitle generation and intelligent chat system
SubtitleAI is an advanced tool that processes YouTube and TikTok videos to generate scene descriptions, translate them, create subtitled videos with text-to-speech narration, and provides an intelligent chatbot for video content analysis. This project leverages state-of-the-art models for video processing, translation, text-to-speech synthesis, and RAG-based question answering.

Note: The project is currently under active development and will be further enhanced with new features over time.
Process Video (AI Descriptions + TTS)
test.mp4
Generate SRT Subtitles(Whisper Transcription)
test2.mp4
Note: The project is currently under active development and will be further enhanced with new features over time.
- Download YouTube & TikTok Videos: Automatically download videos from YouTube or TikTok using URLs
- Scene Detection: Intelligent detection of scene transitions in videos
- Frame Description: Generate English scene descriptions using Gemma3:4b model
- Multi-language Translation: Translate descriptions to Turkish using Gemma3:4b model
- Custom Subtitles: Create videos with customizable subtitles (font, color, position)
- Text-to-Speech: Generate narrated videos using TTS models for English and Turkish
- Summary Generation: Provide comprehensive video summaries in selected language
- Whisper Transcription: Accurate speech-to-text conversion using OpenAI Whisper
- Custom Styling: Dynamic font size, colors, and position control
- Multi-language Support: Generate subtitles in Turkish, English, and other languages
- SRT File Export: Standard SRT format compatible with all video players
- RAG-based Q&A: Ask questions about video content using advanced RAG system
- Audio-to-Text: Convert video audio to text using Whisper or Google Speech Recognition
- Multi-language Support: Process videos in English and Turkish
- Intelligent Responses: Get contextual answers based on video content
- Quick Setup: Setup chatbot without full video processing for faster interaction
- Similarity Search: Find relevant content sections in video transcripts
- Modern Gradio Interface: Clean, responsive web interface
- Dual Action Mode: Choose between full video processing or quick chatbot setup
- Real-time Status: Live updates on processing status
- Dynamic Components: Interface adapts based on selected actions
-
Clone the repository:
git clone https://github.com/oztrkoguz/SubtitleAI.git cd SubtitleAI
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Install Ollama and required models:
# Install Ollama (visit https://ollama.ai for installation) ollama pull phi4:latest ollama pull gemma3:4b
-
Install FFmpeg:
- Windows: Download from https://ffmpeg.org/download.html
- macOS:
brew install ffmpeg
- Ubuntu:
sudo apt install ffmpeg
-
Run the application:
python app.py
-
Input Video:
- Upload a video file, OR
- Enter YouTube URL, OR
- Enter TikTok URL
-
Configure Settings:
- Select language (English/Turkish)
- Customize subtitle settings (font, color, position)
-
Process Video:
- Click "π¬ Process Video" button
- Wait for AI processing
- Get subtitled video + summary + chatbot
-
Input Video URL:
- Enter YouTube or TikTok URL
-
Setup Chatbot:
- Select language for audio processing
- Click "π€ Setup Chatbot Only" button
- Wait for audio-to-text conversion
-
Ask Questions:
- Chat interface becomes active
- Ask questions about video content
- Get intelligent AI responses
-
Input Video URL:
- Enter YouTube or TikTok URL
-
Configure Subtitle Settings:
- Select transcription language (Turkish/English)
- Choose font size, color, and position
-
Generate SRT Subtitles:
- Click "π Generate SRT Subtitles" button
- Wait for Whisper transcription
- Get subtitled video + SRT file
- "What is the main topic of this video?"
- "Can you summarize the key points?"
- app.py: Main application with Gradio interface and integrated chatbot
- describe.py: Video downloading, scene detection, frame description, and translation
- subtitle.py: Subtitle creation and video rendering
- tts.py: Text-to-speech audio generation
- video_chat.py: RAG-based chatbot system with audio-to-text conversion
- srt_subtitle.py: SRT subtitle generation with Whisper transcription
- Gemma3:4b: Scene description generation
- Gemma3:4b: English to Turkish translation
- Phi4:latest: Video content summarization
- OpenAI Whisper: Audio-to-text conversion
- Google Speech Recognition: Fallback audio processing
- FAISS: Vector similarity search
- LangChain: RAG pipeline and document processing
- Sentence Transformers: Multilingual text embeddings
- TTS Models: Text-to-speech generation
- yt-dlp: YouTube/TikTok video downloading
- FFmpeg: Audio/video processing
- Python 3.8+
- FFmpeg installed and accessible
- Internet connection for URL-based videos
- At least 4GB RAM for optimal performance
- Ollama with phi4:latest and gemma3:4b models
- Whisper model (automatically downloaded)
- Sentence transformers model (automatically downloaded)
- Chatbot requires URLs: Video chatbot only works with YouTube/TikTok URLs, not uploaded files
- TikTok optimization: TikTok videos are optimized for short content analysis
- Language consistency: Select the same language for both video processing and chatbot for best results
- Processing time: Full video processing takes longer than chatbot-only setup
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.