Small and Simple Summarizer

Notice that in order to use this, you'll need both a YouTube API key and a Vertex AI API key. Former is required to fetch the transcripts, and latter to summarize them into articles.

Scripts

build_index.py: Reads all .md files in the "chapters" folder and creates an index based on individual tags (i.e. tag + all articles with the tag)
build_index_consolidated.py: Includes semantic grouping generated with an LLM for game dev purposes. Replacing the TAG_MAP allows you to group tags together in the index.
channel.py: (Requires YouTube API key) Lists the videos on a channel. Notice that in some cases, you'll need to dig up the channel ID by looking into the source code. You can generally find if by searching for capitalized "UC"
chapterize.py: (Requires Vertex AI API key) Loops through transcripts in transcripts folder (generated by scrape.py) and processes them with an LLM - currently gemini-2.5-pro-preview-05-06 due to it's capability to handle large contexts and follow instructions; and stores the results in chapters folder.
fetch_single.py: Fetches and prints the captions for a single video. The YouTube API can be flaky and sometimes doesn't return the captions. This uses a youtube_transcript_api package which doesn't require a YT API key.
fetch_via_list.py: (Requires YouTube API key) Takes in a list of URL + Title pairs (i.e. https://www.youtube.com/watch?v=12345678 Random Video) and attempts to fetch the captions. Essentially a retry mechanism.
get_tags.py: This just loops through the chapters to collect tags and outputs them in alphabetical order. Convenient if you want to copy-paste them to LLM for grouping.
scrape.py: (Requires YouTube API key) Fetches transcripts for a given channel and stores them under transcripts folder with a caption and source URL at the top.

Usage

Run scrape.py on your channel python scrapy.py <YOUTUBE_CHANNEL_ID>
Run chapterize.py
Run build_index.py or build_index_consolidated.py (latter might require changing the TAG_MAP

Rest of the scripts are there for convenience.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
build_index.py		build_index.py
build_index_consolidated.py		build_index_consolidated.py
channel.py		channel.py
chapterize.py		chapterize.py
fetch-single.py		fetch-single.py
fetch-via-list.py		fetch-via-list.py
get_tags.py		get_tags.py
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Small and Simple Summarizer

Scripts

Usage

About

Uh oh!

Releases

Packages

Languages

License

Muhwu/youtube-summarizer

Folders and files

Latest commit

History

Repository files navigation

Small and Simple Summarizer

Scripts

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages