🎥 SubtitleAI

AI-powered video subtitle generation and intelligent chat system

SubtitleAI is an advanced tool that processes YouTube and TikTok videos to generate scene descriptions, translate them, create subtitled videos with text-to-speech narration, and provides an intelligent chatbot for video content analysis. This project leverages state-of-the-art models for video processing, translation, text-to-speech synthesis, and RAG-based question answering.

Note: The project is currently under active development and will be further enhanced with new features over time.

Process Video (AI Descriptions + TTS)

test.mp4

Generate SRT Subtitles(Whisper Transcription)

test2.mp4

Note: The project is currently under active development and will be further enhanced with new features over time.

✨ Features

🎬 Video Processing

Download YouTube & TikTok Videos: Automatically download videos from YouTube or TikTok using URLs
Scene Detection: Intelligent detection of scene transitions in videos
Frame Description: Generate English scene descriptions using Gemma3:4b model
Multi-language Translation: Translate descriptions to Turkish using Gemma3:4b model
Custom Subtitles: Create videos with customizable subtitles (font, color, position)
Text-to-Speech: Generate narrated videos using TTS models for English and Turkish
Summary Generation: Provide comprehensive video summaries in selected language

📝 SRT Subtitle Generation (NEW!)

Whisper Transcription: Accurate speech-to-text conversion using OpenAI Whisper
Custom Styling: Dynamic font size, colors, and position control
Multi-language Support: Generate subtitles in Turkish, English, and other languages
SRT File Export: Standard SRT format compatible with all video players

🤖 Video Chatbot (NEW!)

RAG-based Q&A: Ask questions about video content using advanced RAG system
Audio-to-Text: Convert video audio to text using Whisper or Google Speech Recognition
Multi-language Support: Process videos in English and Turkish
Intelligent Responses: Get contextual answers based on video content
Quick Setup: Setup chatbot without full video processing for faster interaction
Similarity Search: Find relevant content sections in video transcripts

🎨 User Interface

Modern Gradio Interface: Clean, responsive web interface
Dual Action Mode: Choose between full video processing or quick chatbot setup
Real-time Status: Live updates on processing status
Dynamic Components: Interface adapts based on selected actions

🚀 Installation

Clone the repository:

git clone https://github.com/oztrkoguz/SubtitleAI.git
cd SubtitleAI

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Install Ollama and required models:

# Install Ollama (visit https://ollama.ai for installation)
ollama pull phi4:latest
ollama pull gemma3:4b

Install FFmpeg:
- Windows: Download from https://ffmpeg.org/download.html
- macOS: brew install ffmpeg
- Ubuntu: sudo apt install ffmpeg

📖 Usage

🎬 Video Processing Mode

Run the application:
```
python app.py
```
Input Video:
- Upload a video file, OR
- Enter YouTube URL, OR
- Enter TikTok URL
Configure Settings:
- Select language (English/Turkish)
- Customize subtitle settings (font, color, position)
Process Video:
- Click "🎬 Process Video" button
- Wait for AI processing
- Get subtitled video + summary + chatbot

🤖 Quick Chatbot Mode

Input Video URL:
- Enter YouTube or TikTok URL
Setup Chatbot:
- Select language for audio processing
- Click "🤖 Setup Chatbot Only" button
- Wait for audio-to-text conversion
Ask Questions:
- Chat interface becomes active
- Ask questions about video content
- Get intelligent AI responses

📝 SRT Subtitle Generation Mode

Input Video URL:
- Enter YouTube or TikTok URL
Configure Subtitle Settings:
- Select transcription language (Turkish/English)
- Choose font size, color, and position
Generate SRT Subtitles:
- Click "📝 Generate SRT Subtitles" button
- Wait for Whisper transcription
- Get subtitled video + SRT file

💬 Example Questions for Chatbot

"What is the main topic of this video?"
"Can you summarize the key points?"

🏗️ Components

app.py: Main application with Gradio interface and integrated chatbot
describe.py: Video downloading, scene detection, frame description, and translation
subtitle.py: Subtitle creation and video rendering
tts.py: Text-to-speech audio generation
video_chat.py: RAG-based chatbot system with audio-to-text conversion
srt_subtitle.py: SRT subtitle generation with Whisper transcription

🤖 Models & Technologies Used

Video Processing

Gemma3:4b: Scene description generation
Gemma3:4b: English to Turkish translation
Phi4:latest: Video content summarization

Chatbot System

OpenAI Whisper: Audio-to-text conversion
Google Speech Recognition: Fallback audio processing
FAISS: Vector similarity search
LangChain: RAG pipeline and document processing
Sentence Transformers: Multilingual text embeddings

Audio & Video

TTS Models: Text-to-speech generation
yt-dlp: YouTube/TikTok video downloading
FFmpeg: Audio/video processing

🔧 Requirements

System Requirements

Python 3.8+
FFmpeg installed and accessible
Internet connection for URL-based videos
At least 4GB RAM for optimal performance

AI Models

Ollama with phi4:latest and gemma3:4b models
Whisper model (automatically downloaded)
Sentence transformers model (automatically downloaded)

⚠️ Important Notes

Chatbot requires URLs: Video chatbot only works with YouTube/TikTok URLs, not uploaded files
TikTok optimization: TikTok videos are optimized for short content analysis
Language consistency: Select the same language for both video processing and chatbot for best results
Processing time: Full video processing takes longer than chatbot-only setup

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎥 SubtitleAI

✨ Features

🎬 Video Processing

📝 SRT Subtitle Generation (NEW!)

🤖 Video Chatbot (NEW!)

🎨 User Interface

🚀 Installation

📖 Usage

🎬 Video Processing Mode

🤖 Quick Chatbot Mode

📝 SRT Subtitle Generation Mode

💬 Example Questions for Chatbot

🏗️ Components

🤖 Models & Technologies Used

Video Processing

Chatbot System

Audio & Video

🔧 Requirements

System Requirements

AI Models

⚠️ Important Notes

🤝 Contributing

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.gradio		.gradio
fonts		fonts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
describe.py		describe.py
requirements.txt		requirements.txt
srt_subtitle.py		srt_subtitle.py
subtitle.py		subtitle.py
tts.py		tts.py
video_chat.py		video_chat.py

License

oztrkoguz/SubtitleAI

Folders and files

Latest commit

History

Repository files navigation

🎥 SubtitleAI

✨ Features

🎬 Video Processing

📝 SRT Subtitle Generation (NEW!)

🤖 Video Chatbot (NEW!)

🎨 User Interface

🚀 Installation

📖 Usage

🎬 Video Processing Mode

🤖 Quick Chatbot Mode

📝 SRT Subtitle Generation Mode

💬 Example Questions for Chatbot

🏗️ Components

🤖 Models & Technologies Used

Video Processing

Chatbot System

Audio & Video

🔧 Requirements

System Requirements

AI Models

⚠️ Important Notes

🤝 Contributing

📝 License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages