Skip to content

using serializd.com data to train my LLM model on my taste in television content. feeding reviews, watched shows, and using tmdb api to fetch more details to give a clear analysis. will replicate the same for letterboxd.

License

Notifications You must be signed in to change notification settings

rahullath/serializd-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฌ Serializd TV Taste Analysis & Recommendation System

A comprehensive system to scrape your Serializd data, analyze your TV taste using AI/ML, enrich data with TMDB, and provide personalized recommendations with future tracking capabilities.

๐ŸŽฏ What This System Does

Current Limitations Addressed

Your original Serializd data was limited:

  • โŒ No ratings or reviews (despite having 394 reviews)
  • โŒ No episode-level details or watch progress
  • โŒ No season counts or metadata
  • โŒ Only basic show titles from 480 watched shows

What We Built

โœ… Enhanced Data Scraping: Captures reviews, ratings, and detailed watch data
โœ… TMDB Enrichment: Adds comprehensive metadata (genres, cast, ratings, etc.)
โœ… AI Taste Analysis: Uses ML to understand your preferences and viewing patterns
โœ… Personalized Recommendations: Smart recommendations based on your taste profile
โœ… Future Tracking System: SQLite database to log future watches and maintain watchlists

๐Ÿ“ System Components

1. Data Scraping

  • enhanced_reviews_scraper.py - Scrapes your 394 reviews with ratings and sentiment
  • click_pagination_scraper.py - Your existing scraper for watched shows (480 shows)

2. Data Enrichment

  • tmdb_enricher.py - Enriches shows with TMDB metadata (genres, cast, ratings, etc.)

3. AI Analysis

  • taste_analyzer.py - Analyzes your taste using ML clustering and sentiment analysis

4. Recommendation & Tracking

  • recommendation_system.py - Generates personalized recommendations and tracks future watches

๐Ÿš€ Setup Instructions

Step 1: Install Dependencies

pip install pandas numpy scikit-learn matplotlib seaborn requests python-dotenv selenium webdriver-manager textblob

Step 2: Get TMDB API Key

  1. Go to TMDB API
  2. Create a free account and get an API key
  3. Add to your .env file:
SERIALIZD_EMAIL=your_email@example.com
SERIALIZD_PASSWORD=your_password
SERIALIZD_USERNAME=morbius
TMDB_API_KEY=your_tmdb_api_key_here

Step 3: Run the Complete Pipeline

Option A: Run Everything Automatically

python run_complete_analysis.py

Option B: Run Step by Step

  1. Scrape Reviews (to get your 394 reviews with ratings):
python enhanced_reviews_scraper.py
  1. Enrich Data with TMDB:
python tmdb_enricher.py
  1. Analyze Your Taste:
python taste_analyzer.py
  1. Generate Recommendations:
python recommendation_system.py

๐Ÿ“Š What You'll Get

1. Enhanced Data Files

  • serializd_reviews.csv - Your 394 reviews with ratings and sentiment
  • enriched_watched_shows.csv - 480 shows with TMDB metadata
  • enriched_reviews.csv - Reviews enriched with show metadata

2. Taste Analysis

  • taste_analysis.json - Comprehensive taste profile
  • taste_analysis_visualization.png - Visual charts of your preferences

3. Recommendations Database

  • tv_tracking.db - SQLite database with:
    • Personalized recommendations based on your taste
    • Watchlist management
    • Future watch logging
    • Statistics tracking

๐ŸŽญ Taste Analysis Features

Genre Preferences

  • Identifies your top genres from 480 watched shows
  • Calculates genre percentages and preferences
  • Uses this for future recommendations

Rating Patterns

  • Analyzes your 394 reviews for rating patterns
  • Converts various rating formats (numeric, letter grades, fractions)
  • Identifies if you're a tough critic or generous rater

Show Characteristics

  • Analyzes preference for long vs short series
  • Identifies if you prefer highly-rated shows
  • Analyzes network/platform preferences
  • Language diversity analysis

Sentiment Analysis

  • Uses TextBlob to analyze sentiment in your review texts
  • Identifies positive vs negative keywords
  • Determines your review writing style

ML Clustering

  • Groups your shows into viewing patterns
  • Uses K-means clustering on genres and metadata
  • Identifies your distinct taste clusters

๐ŸŽฏ Recommendation System Features

Smart Scoring Algorithm

Recommendations are scored based on:

  • Genre Matching (40%): How well genres match your preferences
  • Rating Threshold (30%): Preference for highly-rated shows
  • Popularity (20%): Balance of popular vs niche content
  • Recency (10%): Preference for newer shows

Data Sources

  • TMDB Recommendations: Based on shows you've watched
  • Trending Shows: Current popular content
  • Genre-based: Shows matching your preferred genres

Tracking Features

  • Watch Logging: Log episodes/seasons with ratings and reviews
  • Watchlist Management: Prioritized list of shows to watch
  • Statistics: Track your viewing habits over time

๐Ÿ“ˆ Usage Examples

View Your Taste Analysis

from taste_analyzer import TVTasteAnalyzer

analyzer = TVTasteAnalyzer()
analyzer.load_data()
profile = analyzer.generate_taste_profile()
analyzer.print_summary()

Get Personalized Recommendations

from recommendation_system import TVRecommendationSystem

system = TVRecommendationSystem()
system.generate_recommendations()
system.print_recommendations(limit=10)

Log a Watch

system.log_watch("Breaking Bad", season=1, episode=1, rating=9, review_text="Amazing pilot episode!")

Manage Watchlist

system.add_to_watchlist("The Bear", priority=8, notes="Heard great things about this")
system.print_watchlist()

๐Ÿ” Sample Analysis Output

๐ŸŽฌ YOUR TV TASTE ANALYSIS SUMMARY
============================================================
๐Ÿ“Š Total Shows Watched: 480
๐Ÿ“ Total Reviews Written: 394

๐ŸŽญ TOP GENRES:
  1. Drama: 156 shows (32.5%)
  2. Comedy: 98 shows (20.4%)
  3. Crime: 67 shows (14.0%)
  4. Thriller: 45 shows (9.4%)
  5. Sci-Fi: 34 shows (7.1%)

โญ RATING PATTERNS:
  Average Rating: 7.8/10
  Total Rated Shows: 394

๐Ÿ’ก KEY INSIGHTS:
  1. Your top 3 favorite genres are: Drama, Comedy, Crime
  2. Drama makes up 32.5% of your watched shows
  3. You tend to rate shows highly, suggesting you're selective about what you watch
  4. You prefer longer series with multiple seasons
  5. You tend to watch critically acclaimed shows
  6. Your reviews are generally positive and enthusiastic
  7. Your largest viewing pattern centers around Drama shows

๐ŸŽฏ Addressing Original Limitations

โœ… Ratings & Reviews

  • Before: No ratings despite 394 reviews
  • After: Full sentiment analysis and rating extraction from all reviews

โœ… Episode-Level Details

  • Before: No episode counts or watch progress
  • After: TMDB provides episode counts, seasons, and runtime data

โœ… Rich Metadata

  • Before: Only show titles
  • After: Genres, cast, crew, networks, languages, popularity, ratings, keywords

โœ… Historical Context

  • Before: Limited to Jan 1, 2022+ data
  • After: TMDB provides full show history and context

โœ… Future Tracking

  • Before: No way to track future watches
  • After: Complete SQLite system for ongoing tracking

๐Ÿ”ฎ Future Enhancements

Potential Additions

  1. Integration with Other Platforms: Import from Trakt, IMDb, etc.
  2. Social Features: Compare taste with friends
  3. Advanced ML: Deep learning for better recommendations
  4. Web Interface: Flask/Django web app
  5. Mobile App: React Native or Flutter app
  6. Export Features: Generate reports, share taste profiles

API Integrations

  • Trakt.tv: For historical data pre-2022
  • IMDb: For additional ratings and reviews
  • JustWatch: For streaming availability
  • Rotten Tomatoes: For critic vs audience scores

๐Ÿ› ๏ธ Technical Architecture

Data Flow

Serializd Scraping โ†’ TMDB Enrichment โ†’ AI Analysis โ†’ Recommendations โ†’ Future Tracking

Technologies Used

  • Scraping: Selenium WebDriver
  • Data Processing: Pandas, NumPy
  • Machine Learning: Scikit-learn (K-means, TF-IDF, Cosine Similarity)
  • Sentiment Analysis: TextBlob
  • Visualization: Matplotlib, Seaborn
  • Database: SQLite
  • API: TMDB REST API

File Structure

serializd-py/
โ”œโ”€โ”€ enhanced_reviews_scraper.py    # Scrape reviews & ratings
โ”œโ”€โ”€ tmdb_enricher.py              # Enrich with TMDB data
โ”œโ”€โ”€ taste_analyzer.py             # AI taste analysis
โ”œโ”€โ”€ recommendation_system.py      # Recommendations & tracking
โ”œโ”€โ”€ click_pagination_scraper.py   # Original watched shows scraper
โ”œโ”€โ”€ final_watched_shows.csv       # Your 480 watched shows
โ”œโ”€โ”€ serializd_reviews.csv         # Your 394 reviews (generated)
โ”œโ”€โ”€ enriched_watched_shows.csv    # Shows + TMDB data (generated)
โ”œโ”€โ”€ taste_analysis.json           # Your taste profile (generated)
โ”œโ”€โ”€ tv_tracking.db                # Future tracking database (generated)
โ””โ”€โ”€ .env                          # Your credentials

๐ŸŽ‰ Getting Started

  1. Clone/Download all the Python files
  2. Set up your .env file with credentials and TMDB API key
  3. Install dependencies: pip install -r requirements.txt
  4. Run the enhanced reviews scraper: python enhanced_reviews_scraper.py
  5. Enrich with TMDB: python tmdb_enricher.py
  6. Analyze your taste: python taste_analyzer.py
  7. Generate recommendations: python recommendation_system.py

Your goal of analyzing your TV taste and building a personalized recommendation system is now fully achievable with this comprehensive solution! ๐ŸŽฌโœจ

About

using serializd.com data to train my LLM model on my taste in television content. feeding reviews, watched shows, and using tmdb api to fetch more details to give a clear analysis. will replicate the same for letterboxd.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published