A comprehensive system to scrape your Serializd data, analyze your TV taste using AI/ML, enrich data with TMDB, and provide personalized recommendations with future tracking capabilities.
Your original Serializd data was limited:
- โ No ratings or reviews (despite having 394 reviews)
- โ No episode-level details or watch progress
- โ No season counts or metadata
- โ Only basic show titles from 480 watched shows
โ
Enhanced Data Scraping: Captures reviews, ratings, and detailed watch data
โ
TMDB Enrichment: Adds comprehensive metadata (genres, cast, ratings, etc.)
โ
AI Taste Analysis: Uses ML to understand your preferences and viewing patterns
โ
Personalized Recommendations: Smart recommendations based on your taste profile
โ
Future Tracking System: SQLite database to log future watches and maintain watchlists
enhanced_reviews_scraper.py
- Scrapes your 394 reviews with ratings and sentimentclick_pagination_scraper.py
- Your existing scraper for watched shows (480 shows)
tmdb_enricher.py
- Enriches shows with TMDB metadata (genres, cast, ratings, etc.)
taste_analyzer.py
- Analyzes your taste using ML clustering and sentiment analysis
recommendation_system.py
- Generates personalized recommendations and tracks future watches
pip install pandas numpy scikit-learn matplotlib seaborn requests python-dotenv selenium webdriver-manager textblob
- Go to TMDB API
- Create a free account and get an API key
- Add to your
.env
file:
SERIALIZD_EMAIL=your_email@example.com
SERIALIZD_PASSWORD=your_password
SERIALIZD_USERNAME=morbius
TMDB_API_KEY=your_tmdb_api_key_here
python run_complete_analysis.py
- Scrape Reviews (to get your 394 reviews with ratings):
python enhanced_reviews_scraper.py
- Enrich Data with TMDB:
python tmdb_enricher.py
- Analyze Your Taste:
python taste_analyzer.py
- Generate Recommendations:
python recommendation_system.py
serializd_reviews.csv
- Your 394 reviews with ratings and sentimentenriched_watched_shows.csv
- 480 shows with TMDB metadataenriched_reviews.csv
- Reviews enriched with show metadata
taste_analysis.json
- Comprehensive taste profiletaste_analysis_visualization.png
- Visual charts of your preferences
tv_tracking.db
- SQLite database with:- Personalized recommendations based on your taste
- Watchlist management
- Future watch logging
- Statistics tracking
- Identifies your top genres from 480 watched shows
- Calculates genre percentages and preferences
- Uses this for future recommendations
- Analyzes your 394 reviews for rating patterns
- Converts various rating formats (numeric, letter grades, fractions)
- Identifies if you're a tough critic or generous rater
- Analyzes preference for long vs short series
- Identifies if you prefer highly-rated shows
- Analyzes network/platform preferences
- Language diversity analysis
- Uses TextBlob to analyze sentiment in your review texts
- Identifies positive vs negative keywords
- Determines your review writing style
- Groups your shows into viewing patterns
- Uses K-means clustering on genres and metadata
- Identifies your distinct taste clusters
Recommendations are scored based on:
- Genre Matching (40%): How well genres match your preferences
- Rating Threshold (30%): Preference for highly-rated shows
- Popularity (20%): Balance of popular vs niche content
- Recency (10%): Preference for newer shows
- TMDB Recommendations: Based on shows you've watched
- Trending Shows: Current popular content
- Genre-based: Shows matching your preferred genres
- Watch Logging: Log episodes/seasons with ratings and reviews
- Watchlist Management: Prioritized list of shows to watch
- Statistics: Track your viewing habits over time
from taste_analyzer import TVTasteAnalyzer
analyzer = TVTasteAnalyzer()
analyzer.load_data()
profile = analyzer.generate_taste_profile()
analyzer.print_summary()
from recommendation_system import TVRecommendationSystem
system = TVRecommendationSystem()
system.generate_recommendations()
system.print_recommendations(limit=10)
system.log_watch("Breaking Bad", season=1, episode=1, rating=9, review_text="Amazing pilot episode!")
system.add_to_watchlist("The Bear", priority=8, notes="Heard great things about this")
system.print_watchlist()
๐ฌ YOUR TV TASTE ANALYSIS SUMMARY
============================================================
๐ Total Shows Watched: 480
๐ Total Reviews Written: 394
๐ญ TOP GENRES:
1. Drama: 156 shows (32.5%)
2. Comedy: 98 shows (20.4%)
3. Crime: 67 shows (14.0%)
4. Thriller: 45 shows (9.4%)
5. Sci-Fi: 34 shows (7.1%)
โญ RATING PATTERNS:
Average Rating: 7.8/10
Total Rated Shows: 394
๐ก KEY INSIGHTS:
1. Your top 3 favorite genres are: Drama, Comedy, Crime
2. Drama makes up 32.5% of your watched shows
3. You tend to rate shows highly, suggesting you're selective about what you watch
4. You prefer longer series with multiple seasons
5. You tend to watch critically acclaimed shows
6. Your reviews are generally positive and enthusiastic
7. Your largest viewing pattern centers around Drama shows
- Before: No ratings despite 394 reviews
- After: Full sentiment analysis and rating extraction from all reviews
- Before: No episode counts or watch progress
- After: TMDB provides episode counts, seasons, and runtime data
- Before: Only show titles
- After: Genres, cast, crew, networks, languages, popularity, ratings, keywords
- Before: Limited to Jan 1, 2022+ data
- After: TMDB provides full show history and context
- Before: No way to track future watches
- After: Complete SQLite system for ongoing tracking
- Integration with Other Platforms: Import from Trakt, IMDb, etc.
- Social Features: Compare taste with friends
- Advanced ML: Deep learning for better recommendations
- Web Interface: Flask/Django web app
- Mobile App: React Native or Flutter app
- Export Features: Generate reports, share taste profiles
- Trakt.tv: For historical data pre-2022
- IMDb: For additional ratings and reviews
- JustWatch: For streaming availability
- Rotten Tomatoes: For critic vs audience scores
Serializd Scraping โ TMDB Enrichment โ AI Analysis โ Recommendations โ Future Tracking
- Scraping: Selenium WebDriver
- Data Processing: Pandas, NumPy
- Machine Learning: Scikit-learn (K-means, TF-IDF, Cosine Similarity)
- Sentiment Analysis: TextBlob
- Visualization: Matplotlib, Seaborn
- Database: SQLite
- API: TMDB REST API
serializd-py/
โโโ enhanced_reviews_scraper.py # Scrape reviews & ratings
โโโ tmdb_enricher.py # Enrich with TMDB data
โโโ taste_analyzer.py # AI taste analysis
โโโ recommendation_system.py # Recommendations & tracking
โโโ click_pagination_scraper.py # Original watched shows scraper
โโโ final_watched_shows.csv # Your 480 watched shows
โโโ serializd_reviews.csv # Your 394 reviews (generated)
โโโ enriched_watched_shows.csv # Shows + TMDB data (generated)
โโโ taste_analysis.json # Your taste profile (generated)
โโโ tv_tracking.db # Future tracking database (generated)
โโโ .env # Your credentials
- Clone/Download all the Python files
- Set up your
.env
file with credentials and TMDB API key - Install dependencies:
pip install -r requirements.txt
- Run the enhanced reviews scraper:
python enhanced_reviews_scraper.py
- Enrich with TMDB:
python tmdb_enricher.py
- Analyze your taste:
python taste_analyzer.py
- Generate recommendations:
python recommendation_system.py
Your goal of analyzing your TV taste and building a personalized recommendation system is now fully achievable with this comprehensive solution! ๐ฌโจ