Skip to content

An AI-powered tool for summarizing YouTube videos by generating scene descriptions, translating them, and creating subtitled videos with text-to-speech narration

License

Notifications You must be signed in to change notification settings

oztrkoguz/SubtitleAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

66 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ₯ SubtitleAI

AI-powered video subtitle generation and intelligent chat system

SubtitleAI is an advanced tool that processes YouTube and TikTok videos to generate scene descriptions, translate them, create subtitled videos with text-to-speech narration, and provides an intelligent chatbot for video content analysis. This project leverages state-of-the-art models for video processing, translation, text-to-speech synthesis, and RAG-based question answering.

gradio

Note: The project is currently under active development and will be further enhanced with new features over time.

Process Video (AI Descriptions + TTS)

test.mp4

Generate SRT Subtitles(Whisper Transcription)

test2.mp4

Note: The project is currently under active development and will be further enhanced with new features over time.

✨ Features

🎬 Video Processing

  • Download YouTube & TikTok Videos: Automatically download videos from YouTube or TikTok using URLs
  • Scene Detection: Intelligent detection of scene transitions in videos
  • Frame Description: Generate English scene descriptions using Gemma3:4b model
  • Multi-language Translation: Translate descriptions to Turkish using Gemma3:4b model
  • Custom Subtitles: Create videos with customizable subtitles (font, color, position)
  • Text-to-Speech: Generate narrated videos using TTS models for English and Turkish
  • Summary Generation: Provide comprehensive video summaries in selected language

πŸ“ SRT Subtitle Generation (NEW!)

  • Whisper Transcription: Accurate speech-to-text conversion using OpenAI Whisper
  • Custom Styling: Dynamic font size, colors, and position control
  • Multi-language Support: Generate subtitles in Turkish, English, and other languages
  • SRT File Export: Standard SRT format compatible with all video players

πŸ€– Video Chatbot (NEW!)

  • RAG-based Q&A: Ask questions about video content using advanced RAG system
  • Audio-to-Text: Convert video audio to text using Whisper or Google Speech Recognition
  • Multi-language Support: Process videos in English and Turkish
  • Intelligent Responses: Get contextual answers based on video content
  • Quick Setup: Setup chatbot without full video processing for faster interaction
  • Similarity Search: Find relevant content sections in video transcripts

🎨 User Interface

  • Modern Gradio Interface: Clean, responsive web interface
  • Dual Action Mode: Choose between full video processing or quick chatbot setup
  • Real-time Status: Live updates on processing status
  • Dynamic Components: Interface adapts based on selected actions

πŸš€ Installation

  1. Clone the repository:

    git clone https://github.com/oztrkoguz/SubtitleAI.git
    cd SubtitleAI
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the required packages:

    pip install -r requirements.txt
  4. Install Ollama and required models:

    # Install Ollama (visit https://ollama.ai for installation)
    ollama pull phi4:latest
    ollama pull gemma3:4b
  5. Install FFmpeg:

πŸ“– Usage

🎬 Video Processing Mode

  1. Run the application:

    python app.py
  2. Input Video:

    • Upload a video file, OR
    • Enter YouTube URL, OR
    • Enter TikTok URL
  3. Configure Settings:

    • Select language (English/Turkish)
    • Customize subtitle settings (font, color, position)
  4. Process Video:

    • Click "🎬 Process Video" button
    • Wait for AI processing
    • Get subtitled video + summary + chatbot

πŸ€– Quick Chatbot Mode

  1. Input Video URL:

    • Enter YouTube or TikTok URL
  2. Setup Chatbot:

    • Select language for audio processing
    • Click "πŸ€– Setup Chatbot Only" button
    • Wait for audio-to-text conversion
  3. Ask Questions:

    • Chat interface becomes active
    • Ask questions about video content
    • Get intelligent AI responses

πŸ“ SRT Subtitle Generation Mode

  1. Input Video URL:

    • Enter YouTube or TikTok URL
  2. Configure Subtitle Settings:

    • Select transcription language (Turkish/English)
    • Choose font size, color, and position
  3. Generate SRT Subtitles:

    • Click "πŸ“ Generate SRT Subtitles" button
    • Wait for Whisper transcription
    • Get subtitled video + SRT file

πŸ’¬ Example Questions for Chatbot

  • "What is the main topic of this video?"
  • "Can you summarize the key points?"

πŸ—οΈ Components

  • app.py: Main application with Gradio interface and integrated chatbot
  • describe.py: Video downloading, scene detection, frame description, and translation
  • subtitle.py: Subtitle creation and video rendering
  • tts.py: Text-to-speech audio generation
  • video_chat.py: RAG-based chatbot system with audio-to-text conversion
  • srt_subtitle.py: SRT subtitle generation with Whisper transcription

πŸ€– Models & Technologies Used

Video Processing

Chatbot System

Audio & Video

  • TTS Models: Text-to-speech generation
  • yt-dlp: YouTube/TikTok video downloading
  • FFmpeg: Audio/video processing

πŸ”§ Requirements

System Requirements

  • Python 3.8+
  • FFmpeg installed and accessible
  • Internet connection for URL-based videos
  • At least 4GB RAM for optimal performance

AI Models

  • Ollama with phi4:latest and gemma3:4b models
  • Whisper model (automatically downloaded)
  • Sentence transformers model (automatically downloaded)

⚠️ Important Notes

  • Chatbot requires URLs: Video chatbot only works with YouTube/TikTok URLs, not uploaded files
  • TikTok optimization: TikTok videos are optimized for short content analysis
  • Language consistency: Select the same language for both video processing and chatbot for best results
  • Processing time: Full video processing takes longer than chatbot-only setup

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

An AI-powered tool for summarizing YouTube videos by generating scene descriptions, translating them, and creating subtitled videos with text-to-speech narration

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages