An NLP/LLM-powered TV-Series-Analysis System for understanding story elements, character relationships, and theme analysis of any TV show. This AI-system relies on robust tech stack including python, pytorch, gradio, scrapy, beautifulsoup4, glob, sklearn, seaborn, pandas, numpy, matplotlib, spacy, networkx, transformer-models, huggingface, nltk, pyvis. Also used two impressive models, spacy's en_core_web_trf for character network & hugging face's facebook/bart-large-mnli for theme classifier. Expertise in ML/DL, AI engineering, Neural nets, LLMs, Transformer models & web scraping is beneficial for extending or understanding this system.
A comprehensive system to analyze and visualize any TV series βdemonstrated here using "Naruto" tv seiesβ with a user-friendly interface built using Gradio. The system is structured into three major components.
(1) Gathering Dataset : where 3-types of dataset requierd. These are subtitle, transcript, classification dataset. The subtitle is collected "subtitlist", the transcript from kaggle and the classification data are scraped from "narutopedia" website using Scrapy and BeautifulSoup tool.
(2) Theme Classification : basically extracts the main theme of the series. It tells us how much each theme (%) is occurring in the TV-Series provided that we have input themes like (comma separated) : friendship, battle, sacrifice, love, dialogue etc. This is done using zero-shot classifier by leveraging Hugging Face's "facebook/bart-large-mnli" model to extract theme from the subtitle dataset and lastly
(3) Character Network : shows how big each character is and plot their relationship with each other. This uses SpaCyβs NER (Named-Entity-Recognition) model (en_core_web_trf) to identify and connect character entities in a network visualized via Pyvis.
- Fandom Analysis & Exploration
- Content Recommendation & Tagging
- Scriptwriter & Creator Insights
- Educational/NLP Research Tool
- Interactive Dashboards for Viewers
- Comparative Series Analysis
- Archiving & Metadata Generation
- Current Version: V1.0
- Actively maintained & expanded
llm-tv-series-analysis/
βββ assets/
β βββ images/
βββ crawler/
β βββ .ass
βββ data/
β βββ subtitles/
β | βββ datasets_link.txt
β βββ datasets_link.txt
β βββ jutsu.jsonl
β βββ jutsus.jsnol
β βββ naturo.csv
βββ stubs/
β βββ ner_output.csv
β βββ theme_classifier_output.csv
βββ text-classification/
β βββ jutsu_classifier_development.ipynb
βββ theme-classifier/
β βββ __init__.py
β βββ theme_classification_development.ipynb
β βββ theme_classifier.py
βββ character-network/
β βββ __init__.py
β βββ character_network_generator.ipynb
β βββ character_network_generator.py
β βββ named_entity_recognizer.py
β βββ naruto.html
βββ utils/
β βββ __init__.py
β βββ data_loader.py
βββ .gitignore
βββ gradio_app.py
βββ llm-tv-series-analysis.gdoc
βββ llm_tv_series_analysis_development.ipynb
βββ requirements.txt
βββ README.md
β Dataset gathering (subtitles, transcripts, and custom classification sets) β Zero-shot theme classifier using facebook/bart-large-mnli β Character relationship network using Named Entity Recognition (en_core_web_trf)
π οΈ In progress: β«οΈ Attack type classifier (distilBERT-based) β«οΈ Chatbot with character personality using fine-tuned LLaMA 3.1
- Python programming, Web Scraping
- ML/DL fundamentals, Transformers, Hugging Face Hub
- NLP tools like NLTK and spaCy
- LLM knowledge for future chatbot development
- IDE (VS Code) or jupyter notebook or google colab
- Python 3, html, css
- Language: python, html, css
- Web Scraping: scrapy, beautifulsoup4
- NLP/ML/LLM: transformers, huggingface_hub, nltk, spacy, sklearn, pandas, numpy, networkx, pyvis
- Deep Learning: pytorch, transformers-models (en_core_web_trf & facebook/bart-large-mnli)
- Visualization: matplotlib, seaborn, pyvis.network
- UI/ML-app: gradio
- Subtitle Dataset: Main content used for theme classification and character network.
- Transcript Dataset: Maps dialogues to speakers, crucial for chatbot.
- Classification Dataset: Scraped from Naruto Fandom Wiki for training an attack classifier: ninjutsu, genjutsu, and taijutsu. . Tools used: scrapy, bs4
- Model: facebook/bart-large-mnli (Zero-Shot)
- Input: Subtitle data + custom themes (e.g., friendship, battle, sacrifice)
- Output: CSV showing theme percentages across the series
- NER Model: en_core_web_trf via spaCy
- Generate interactive network graph using networkx + pyvis
- Furtue work
git clone https://github.com/pointer2Alvee/llm-tv-series-analysis.git
cd tv-series-analyzer
# Recommended: Use a virtual environment
pip install -r requirements.txt
scrapy
beautifulsoup4
transformers==4.44.0
huggingface_hub==0.24.5
nltk==3.8.1
gradio
pyvis
spacy
torch
pandas
numpy
networkx
seaborn
matplotlib
- Open Repo in VS code
- Run Command :
python gradio_app.py
- Wait..
- Open Local Host link in Browser
For Theme-Classifier :-
- provide themes in text field:
friendship, battle, sacrifice, love, dialogue
- Provide Subtitle Path :
data\Subtitles
- Provide output Save path :
stubs\theme_classifier_output.csv
- Click "Get Themes" Button
For Character-Network :-
-
Provide Subtitle Path :
data\Subtitles
-
Provide output Save path :
stubs\ner_output.csv
-
Click "Get Character Network" Button
-
On Colab? Use the global URL printed after running this to open the UI in your browser.
- Open VS Code and run the a bove commands
-
β Web Scraping
-
β BERT model & NER model
-
β Theme Classifier, Character Relationship Network
-
β³ Upcoming : Chatbot, Text Classifier
- Full attack classifier with fine-tuned DistilBERT
- Fully interactive character chatbot (LLaMA-based)
- Support for other anime/TV series via config
Contributions are welcomed!
- Fork the repo.
- Create a branch:
git checkout -b feature/YourFeature
- Commit changes:
git commit -m 'Add some feature'
- Push to branch:
git push origin feature/YourFeature
- Open a Pull Request.
Distributed under the MIT License. See LICENSE.txt for more information.
- Special thanks to the open-source community / youtube for tools and resources.