This project automatically extracts transcripts from YouTube videos, summarizes them using Google's Gemini 1.5 Flash model, and stores the results in a persistent Chroma vector database for later retrieval and querying.
- Download and parse YouTube video transcripts
- Summarize transcripts using Gemini 1.5 Flash
- Store notes in a Chroma vector database with vector embeddings
- Easily search or expand later using vector search
- Google Generative AI
- ChromaDB
- youtube-transcript-api
- Python 3, Google Colab
- SQLite3,
.bin
index files - Google Drive integration
- The user provides a YouTube video ID.
- The transcript is fetched using
youtube-transcript-api
. - Gemini generates a summary based on the transcript and a customizable prompt.
- The result is saved as
.txt
and stored in a Chroma vector collection with an embedding function.
yt-notes/
├── get_video_notes.ipynb # Main Colab notebook
├── chroma.sqlite3 # Metadata for vector DB
├── index/ # Binary index files (auto-generated)
├── temp_notes.txt # Summarized notes (temporary)
└── temp_transcript.txt # Transcript from YouTube video (temporary)
Secrets must be added securely. For Colab, use the Secrets tab or userdata.get()
:
GEMINI_API_KEY
- your Google AI API keyCHROMA_GOOGLE_GENAI_API_KEY
- used for embedding functions
from google.colab import userdata
GEMINI_API_KEY = userdata.get("GEMINI_API_KEY")
- Mount your Google Drive
- Install dependencies (via
!pip install
) - Add your Secrets using the Colab sidebar
- Run the notebook
get_video_notes.ipynb
PRs and ideas are welcome! Just open an issue or fork the repo.
Built by Hadeel❤️