Skip to content

An NLP/LLM-powered TV-Series-Analysis System for understanding story elements through building character relationships networks and classifying themes of any TV show. Build on techs like python, pytorch, gradio, scrapy, pandas, numpy, matplotlib, nltk, networkx, spacy's en_core_web_trf & hugging face's facebook/bart-large-mnli transformer-models

Notifications You must be signed in to change notification settings

pointer2Alvee/llm-tv-series-analysis

Repository files navigation

Image 1
Image 2

πŸ“œ llm-tv-series-analysis

πŸ“Œ Summary

An NLP/LLM-powered TV-Series-Analysis System for understanding story elements, character relationships, and theme analysis of any TV show. This AI-system relies on robust tech stack including python, pytorch, gradio, scrapy, beautifulsoup4, glob, sklearn, seaborn, pandas, numpy, matplotlib, spacy, networkx, transformer-models, huggingface, nltk, pyvis. Also used two impressive models, spacy's en_core_web_trf for character network & hugging face's facebook/bart-large-mnli for theme classifier. Expertise in ML/DL, AI engineering, Neural nets, LLMs, Transformer models & web scraping is beneficial for extending or understanding this system.

🧠 Overview

A comprehensive system to analyze and visualize any TV series β€”demonstrated here using "Naruto" tv seiesβ€” with a user-friendly interface built using Gradio. The system is structured into three major components.

(1) Gathering Dataset : where 3-types of dataset requierd. These are subtitle, transcript, classification dataset. The subtitle is collected "subtitlist", the transcript from kaggle and the classification data are scraped from "narutopedia" website using Scrapy and BeautifulSoup tool.

(2) Theme Classification : basically extracts the main theme of the series. It tells us how much each theme (%) is occurring in the TV-Series provided that we have input themes like (comma separated) : friendship, battle, sacrifice, love, dialogue etc. This is done using zero-shot classifier by leveraging Hugging Face's "facebook/bart-large-mnli" model to extract theme from the subtitle dataset and lastly

(3) Character Network : shows how big each character is and plot their relationship with each other. This uses SpaCy’s NER (Named-Entity-Recognition) model (en_core_web_trf) to identify and connect character entities in a network visualized via Pyvis.

🎯 Use Cases

  • Fandom Analysis & Exploration
  • Content Recommendation & Tagging
  • Scriptwriter & Creator Insights
  • Educational/NLP Research Tool
  • Interactive Dashboards for Viewers
  • Comparative Series Analysis
  • Archiving & Metadata Generation

🟒 Project Status

  • Current Version: V1.0
  • Actively maintained & expanded

πŸ“‚ Repository Structure

llm-tv-series-analysis/
β”œβ”€β”€ assets/
β”‚   └── images/
β”œβ”€β”€ crawler/
β”‚   └── .ass
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ subtitles/
β”‚   |   β”œβ”€β”€ datasets_link.txt
β”‚   β”œβ”€β”€ datasets_link.txt
β”‚   β”œβ”€β”€ jutsu.jsonl
β”‚   β”œβ”€β”€ jutsus.jsnol
β”‚   └── naturo.csv
β”œβ”€β”€ stubs/
β”‚   β”œβ”€β”€ ner_output.csv
β”‚   └── theme_classifier_output.csv
β”œβ”€β”€ text-classification/
β”‚   └── jutsu_classifier_development.ipynb
β”œβ”€β”€ theme-classifier/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ theme_classification_development.ipynb
β”‚   └── theme_classifier.py
β”œβ”€β”€ character-network/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ character_network_generator.ipynb
β”‚   β”œβ”€β”€ character_network_generator.py
β”‚   β”œβ”€β”€ named_entity_recognizer.py
β”‚   └── naruto.html
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── data_loader.py         
β”œβ”€β”€ .gitignore
β”œβ”€β”€ gradio_app.py
β”œβ”€β”€ llm-tv-series-analysis.gdoc
β”œβ”€β”€ llm_tv_series_analysis_development.ipynb
β”œβ”€β”€ requirements.txt
└── README.md

✨ Features

βœ… Dataset gathering (subtitles, transcripts, and custom classification sets) βœ… Zero-shot theme classifier using facebook/bart-large-mnli βœ… Character relationship network using Named Entity Recognition (en_core_web_trf)

πŸ› οΈ In progress: ▫️ Attack type classifier (distilBERT-based) ▫️ Chatbot with character personality using fine-tuned LLaMA 3.1

πŸŽ₯ Demo

YouTube Video

πŸš€ Getting Started

πŸ“š Knowledge & Skills Required

  • Python programming, Web Scraping
  • ML/DL fundamentals, Transformers, Hugging Face Hub
  • NLP tools like NLTK and spaCy
  • LLM knowledge for future chatbot development

πŸ’» Software Requirements

  • IDE (VS Code) or jupyter notebook or google colab
  • Python 3, html, css

πŸ›‘οΈ Tech Stack

  • Language: python, html, css
  • Web Scraping: scrapy, beautifulsoup4
  • NLP/ML/LLM: transformers, huggingface_hub, nltk, spacy, sklearn, pandas, numpy, networkx, pyvis
  • Deep Learning: pytorch, transformers-models (en_core_web_trf & facebook/bart-large-mnli)
  • Visualization: matplotlib, seaborn, pyvis.network
  • UI/ML-app: gradio

πŸ” Modules Breakdown

πŸ“₯ Dataset Collection
  • Subtitle Dataset: Main content used for theme classification and character network.
  • Transcript Dataset: Maps dialogues to speakers, crucial for chatbot.
  • Classification Dataset: Scraped from Naruto Fandom Wiki for training an attack classifier: ninjutsu, genjutsu, and taijutsu. . Tools used: scrapy, bs4
🎭 Theme Classifier
  • Model: facebook/bart-large-mnli (Zero-Shot)
  • Input: Subtitle data + custom themes (e.g., friendship, battle, sacrifice)
  • Output: CSV showing theme percentages across the series
πŸ§‘β€πŸ€β€πŸ§‘ Character Network
  • NER Model: en_core_web_trf via spaCy
  • Generate interactive network graph using networkx + pyvis
πŸ“Š Evaluation
  • Furtue work

βš™οΈ Installation

git clone https://github.com/pointer2Alvee/llm-tv-series-analysis.git
cd tv-series-analyzer

# Recommended: Use a virtual environment
pip install -r requirements.txt
πŸ–‡οΈ requirements.txt (core packages):
scrapy
beautifulsoup4
transformers==4.44.0
huggingface_hub==0.24.5
nltk==3.8.1
gradio
pyvis
spacy
torch
pandas
numpy
networkx
seaborn
matplotlib
πŸ’» Running the App Locally
  1. Open Repo in VS code
  2. Run Command : python gradio_app.py
  3. Wait..
  4. Open Local Host link in Browser

For Theme-Classifier :-

  • provide themes in text field: friendship, battle, sacrifice, love, dialogue
  • Provide Subtitle Path : data\Subtitles
  • Provide output Save path : stubs\theme_classifier_output.csv
  • Click "Get Themes" Button

For Character-Network :-

  • Provide Subtitle Path : data\Subtitles

  • Provide output Save path : stubs\ner_output.csv

  • Click "Get Character Network" Button

  • On Colab? Use the global URL printed after running this to open the UI in your browser.

πŸ“– Usage

  • Open VS Code and run the a bove commands

πŸ§ͺ Sample Topics Implemented

  • βœ… Web Scraping

  • βœ… BERT model & NER model

  • βœ… Theme Classifier, Character Relationship Network

  • ⏳ Upcoming : Chatbot, Text Classifier

🧭 Roadmap

  • Full attack classifier with fine-tuned DistilBERT
  • Fully interactive character chatbot (LLaMA-based)
  • Support for other anime/TV series via config

🀝 Contributing

Contributions are welcomed!

  1. Fork the repo.
  2. Create a branch: git checkout -b feature/YourFeature
  3. Commit changes: git commit -m 'Add some feature'
  4. Push to branch: git push origin feature/YourFeature
  5. Open a Pull Request.

πŸ“œLicense

Distributed under the MIT License. See LICENSE.txt for more information.

πŸ™Acknowledgements

  • Special thanks to the open-source community / youtube for tools and resources.

About

An NLP/LLM-powered TV-Series-Analysis System for understanding story elements through building character relationships networks and classifying themes of any TV show. Build on techs like python, pytorch, gradio, scrapy, pandas, numpy, matplotlib, nltk, networkx, spacy's en_core_web_trf & hugging face's facebook/bart-large-mnli transformer-models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published