A powerful, interactive CLI tool that fetches YouTube video transcripts and generates concise summaries using local LLM models via Ollama. Perfect for researchers, content creators, and anyone who needs to quickly digest video content.
Feature | Description |
---|---|
Local Processing | No API keys required - uses Ollama for 100% local LLM processing |
Research Plans | Focused content extraction with corpus aggregation and analysis |
Smart Transcript Fetching | Prefers manual captions, falls back to auto-generated transcripts |
Interactive TUI | Beautiful terminal interface with guided workflows |
Multiple Input Formats | Supports .txt , .list , .urls , and .csv files |
Intelligent Chunking | Automatically splits long videos for high-quality summaries |
Caching System | Caches transcripts to avoid re-downloading |
Progress Tracking | Real-time progress indicators and status updates |
Flexible Output | Markdown summaries with YAML frontmatter |
Comprehensive Logging | JSON logs for processing history and debugging |
- Researchers analyzing video content
- Content creators studying competitor videos
- Students summarizing lecture recordings
- Professionals processing meeting recordings
- Anyone who needs to quickly understand video content
- Python β₯ 3.11 < 3.14
- Poetry for dependency management
- Ollama runtime (CPU or GPU)
Zero external APIs - everything runs locally once transcripts are cached!
git clone <your-repo-url>
cd youtube-summarizer
# Install dependencies
poetry install
# Allow direnv (if using)
direnv allow
# Install Ollama (macOS)
brew install ollama
# Or Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama3.2:latest
# Start Ollama server
ollama serve
# Interactive mode (recommended)
./run
# Or use Poetry directly
poetry run yt-summarizer
Option 1: Run Script (Recommended)
./run # Interactive mode
./run videos.txt # Legacy mode with file
Option 2: Shell Alias
# Add to ~/.zshrc or ~/.bashrc:
alias yts="cd /path/to/youtube-summarizer && poetry run yt-summarizer"
Option 3: Direct Poetry
poetry run yt-summarizer
The interactive TUI provides a guided experience:
- π Default file (
videos.txt
if present) - π Custom file (smart file browser with format filtering)
- π Single URL (paste any YouTube URL)
When choosing custom files, you'll see:
- Only compatible formats (
.txt
,.list
,.urls
,.csv
) - Visual file browser with icons
- Directory browsing option
- Manual path entry fallback
- Model selection with defaults
- Cache preferences
- File conflict handling (overwrite/skip/version)
- β Summarize more videos
- π§Ή Clean transcript cache
- πͺ Quit
The research plan system enables focused content extraction from YouTube videos based on specific research topics, with corpus aggregation and analysis capabilities.
- Targeted extraction - Extract only relevant content (e.g., specific prompts, techniques, insights)
- Multi-video analysis - Process entire video collections with unified methodology
- Corpus aggregation - Combine individual summaries into comprehensive research documents
- Pattern analysis - Identify themes and insights across multiple videos
- Create Research Plan - Define your research focus and custom prompts
- Process Videos - Extract targeted content using plan-specific prompts
- Aggregate Corpus - Combine all video summaries into a unified document
- Analyze Patterns - Generate insights and identify common themes
./run
# Select "π¬ Research Plan" from the main menu
# Choose "β Create New Plan"
# Follow the guided setup:
# - Enter plan name and description
# - Configure video sources (URLs and/or files)
# - Ready-to-use plan created automatically
Plans are stored as YAML files in research_plans/
:
research_plan:
name: "LLM Prompting Techniques"
description: "Extract specific prompts from LLM-related videos"
videos:
urls:
- "https://www.youtube.com/watch?v=VIDEO_ID_1"
list_file: "videolist.txt" # Optional
prompts:
chunk_prompt: |
Extract only the specific prompts mentioned in this transcript:
{chunk}
executive_prompt: |
Organize the extracted prompts from this video:
{bullet_summaries}
data/
βββ videos/ # Individual video summaries
βββ corpus/ # Research plan aggregations
β βββ plan_name.md # Combined summaries
β βββ plan_name_summary.md # Final analysis
βββ raw/ # Cached transcripts
research_plans/ # Plan configurations
βββ my_research.yaml
βββ example_llm_prompting.yaml
Format | Description | Example |
---|---|---|
.txt |
One URL/ID per line | dQw4w9WgXcQ https://youtu.be/... |
.list |
Video list files | Same as .txt |
.urls |
URL files | Same as .txt |
.csv |
CSV with URLs in first column | url,title dQw4w9WgXcQ,Rick Roll |
Features:
- Comments supported (
# comment
) - Auto-detects CSV headers
- Validates video IDs/URLs
- UTF-8 encoding
Step | Command | Notes |
---|---|---|
Install | macOS: brew install ollama Linux: curl -fsSL https://ollama.ai/install.sh | sh |
See ollama.ai for other platforms |
Pull Model | ollama pull llama3.2:latest |
Downloads once, stored locally |
Start Server | ollama serve |
Runs API on port 11434 |
List Models | ollama list |
See available models |
Switch Models | Use --model flag or interactive selection |
Any model from ollama list |
Memory Considerations:
- Use smaller models (
llama3.2:1b
,phi3:mini
) for limited memory - Set
OLLAMA_NO_GPU=1
for CPU-only processing - Larger models (
llama3.2:latest
) provide better summaries
./run
poetry run yt-summarizer
./run videos.txt --model llama3.2:latest
poetry run yt-summarizer videos.txt --model llama3.2:latest
Options:
--model MODEL Ollama model tag (default: llama3.2:latest)
--interactive Force interactive mode
--help Show help message
youtube-summarizer/
βββ src/yt_summarizer/ # Main package
β βββ cli.py # Interactive TUI
β βββ config.py # Configuration management
β βββ corpus.py # Research corpus aggregation
β βββ llm.py # Ollama integration
β βββ pipeline.py # Processing orchestration
β βββ research_plan.py # Research plan management
β βββ transcript.py # YouTube API handling
β βββ utils.py # Utilities & markdown
βββ data/ # Generated content
β βββ raw/ # Cached transcripts (.txt)
β βββ docs/ # Individual video summaries
β βββ videos/ # Research plan video summaries
β βββ corpus/ # Research plan aggregations
βββ research_plans/ # Research configurations
β βββ *.yaml # Plan definitions
βββ logs/ # Processing logs
β βββ ingest.jsonl # Structured activity log
βββ .env.example # Configuration template
βββ run # Launcher script
---
video_id: dQw4w9WgXcQ
url: https://youtu.be/dQw4w9WgXcQ
title: "Never Gonna Give You Up"
saved: 2025-05-22T12:34:56Z
model: llama3.2:latest
chunk_count: 3
tags: [youtube, transcript]
---
## Executive Summary
[Comprehensive overview of the video content]
## Part Summaries
### Part 1
[Summary of first chunk]
### Part 2
[Summary of second chunk]
{
"timestamp": 1642867200,
"video_id": "dQw4w9WgXcQ",
"title": "Never Gonna Give You Up",
"status": "success",
"chunk_count": 3,
"model": "llama3.2:latest"
}
# Ollama Configuration
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:latest
# Processing
CHUNK_SIZE=2048
RATE_LIMIT_DELAY=2.0
# Directories
DATA_DIR=data
DOCS_DIR=data/docs
LOGS_DIR=logs
# Timeouts
OLLAMA_TIMEOUT=300
YOUTUBE_TIMEOUT=30
- Environment variables (
.env
file) - Built-in defaults
- CLI arguments (legacy mode)
Issue | Solution |
---|---|
NoTranscriptFound |
Video has no public captions - try a different video |
LLMConnectionError |
Start Ollama server: ollama serve |
HTTP 404 from Ollama |
Check model exists: ollama list or ollama pull <model> |
Out of memory | Use smaller model (llama3.2:1b ) or OLLAMA_NO_GPU=1 |
Poetry install fails | Ensure Python 3.11-3.13, update Poetry |
Rate limiting | Built-in 2-second delays prevent YouTube API issues |
# Enable verbose logging
export PYTHONPATH=src
python -c "
import logging
logging.basicConfig(level=logging.DEBUG)
from yt_summarizer.pipeline import process_single_video
process_single_video('dQw4w9WgXcQ')
"
# Run complete test suite (29 tests)
poetry run pytest
# Quick setup verification
python verify_setup.py
# Test the application
echo "dQw4w9WgXcQ" > test_videos.txt
./run test_videos.txt
# Test interactive mode
./run
Safe Test Videos:
dQw4w9WgXcQ
- Rick Roll (guaranteed captions)jNQXAC9IVRw
- First YouTube video (short)
- 100% local processing - no data sent to external APIs
- Cached transcripts stored locally in
data/raw/
- Rate limiting prevents overwhelming YouTube's servers
- No API keys or authentication required
- Open source - inspect all code
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Follow existing code style
- Submit a pull request
poetry install --with dev
poetry run black src/
poetry run mypy src/
poetry run pytest
MIT License - see LICENSE file for details.
- youtube-transcript-api for transcript fetching
- Ollama for local LLM inference
- questionary for beautiful TUI interactions
- Poetry for dependency management
Happy Summarizing! π¬β¨