This project is a powerful Streamlit-based web application that extracts and processes YouTube video transcripts using advanced Natural Language Processing (NLP) techniques.
It provides:
- ✂️ Summarization using Transformer models (T5)
- 🔑 Keyword Extraction with TF-IDF & fallback lemmatization
- 🧠 Topic Modeling using LDA
- 😊 Sentiment Analysis using both VADER and TextBlob
- 📥 Export options to download the results as
.txt
or.csv
- Full Name: Monpara Romil Kamleshbhai
- 🎓 B.Tech in Information Technology, LJIET (Graduating in 2027)
- GitHub: https://github.com/romilmonpara
- LinkedIn: https://www.linkedin.com/in/romilmonpara
- 🔗 Input any YouTube video URL
- 📝 Extracts and cleans transcript using
youtube_transcript_api
- 🤖 Summarizes content via Hugging Face T5 Transformer model
- 🧹 Keyword extraction using TF-IDF with fallback to word frequency
- 📚 Topic modeling via LDA (
scikit-learn
) - ❤️ Sentiment analysis using:
TextBlob
(Polarity, Subjectivity)NLTK
's VADER (Positive, Negative, Neutral, Compound)
- 📊 Visual representation of sentiment results
- 📤 Downloadable results (TXT for summary, CSV for full data)
- Frontend: Streamlit
- Core Libraries:
youtube_transcript_api
,pytube
– YouTube integrationtransformers
– T5 summarization modelnltk
,textblob
– NLP, sentiment analysisscikit-learn
– Topic modeling (LDA)matplotlib
,pandas
,base64
– Visuals & exportstreamlit
– UI & interaction
git clone https://github.com/romilmonpara/youtube-transcript-streamlit-ui.git
Make sure you have Python 3.7+ installed.
pip install -r requirements.txt
import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('omw-1.4')
nltk.download('vader_lexicon')
streamlit run app.py
- Open the app in your browser after launching Streamlit.
- Paste a YouTube video URL in the input box.
- Click "Analyze Video".
- View:
- Video metadata (title, author, views, etc.)
- Raw transcript (optional)
- Summary
- Keywords
- Topics
- Sentiment plots
- Download:
- 📄
summary.txt
- 📊
analysis.csv
- 📄
-
Summary:
"In this video, the speaker discusses..."
-
Top Keywords:
data science, machine learning, deep learning...
-
Topics:
Topic 1: ai, data, learning
Topic 2: video, algorithm, streamlit -
Sentiment:
Polarity: 0.15 | Subjectivity: 0.45
VADER Compound Score: 0.74
-
Transcript Not Available:
The video must have closed captions enabled.
-
Invalid URL:
Only standard YouTube links are accepted.
-
Model Error / CUDA Out of Memory:
Reduce summary length or input shorter videos.
- 🌐 Multilingual transcript support
- ⚡ Faster summarization with GPU model serving
- 🧠 Use more advanced models like BART or Pegasus
- 🖼️ Improve UI with themes and mobile responsiveness
- Hugging Face Transformers
- NLTK & TextBlob teams
- Streamlit Community
- YouTube Transcript API developers