Skip to content

melipefello/youtube-summarizer

 
 

Repository files navigation

Small and Simple Summarizer

Notice that in order to use this, you'll need both a YouTube API key and a Vertex AI API key. Former is required to fetch the transcripts, and latter to summarize them into articles.

Scripts

  • build_index.py: Reads all .md files in the "chapters" folder and creates an index based on individual tags (i.e. tag + all articles with the tag)
  • build_index_consolidated.py: Includes semantic grouping generated with an LLM for game dev purposes. Replacing the TAG_MAP allows you to group tags together in the index.
  • channel.py: (Requires YouTube API key) Lists the videos on a channel. Notice that in some cases, you'll need to dig up the channel ID by looking into the source code. You can generally find if by searching for capitalized "UC"
  • chapterize.py: (Requires Vertex AI API key) Loops through transcripts in transcripts folder (generated by scrape.py) and processes them with an LLM - currently gemini-2.5-pro-preview-05-06 due to it's capability to handle large contexts and follow instructions; and stores the results in chapters folder.
  • fetch_single.py: Fetches and prints the captions for a single video. The YouTube API can be flaky and sometimes doesn't return the captions. This uses a youtube_transcript_api package which doesn't require a YT API key.
  • fetch_via_list.py: (Requires YouTube API key) Takes in a list of URL + Title pairs (i.e. https://www.youtube.com/watch?v=12345678 Random Video) and attempts to fetch the captions. Essentially a retry mechanism.
  • get_tags.py: This just loops through the chapters to collect tags and outputs them in alphabetical order. Convenient if you want to copy-paste them to LLM for grouping.
  • scrape.py: (Requires YouTube API key) Fetches transcripts for a given channel and stores them under transcripts folder with a caption and source URL at the top.

Usage

  • Run scrape.py on your channel python scrapy.py <YOUTUBE_CHANNEL_ID>
  • Run chapterize.py
  • Run build_index.py or build_index_consolidated.py (latter might require changing the TAG_MAP

Rest of the scripts are there for convenience.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%