Skip to content

Full-stack machine learning project that predicts viewer satisfaction (high ratings) on Netflix using demographic data and TMDB movie metadata. Includes EDA, XGBoost modeling, and real-time enrichment using the TMDB API.

Notifications You must be signed in to change notification settings

Athharv5/Netflix-rating-predictor-StreamPulse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🎬 StreamPulse – Predicting Netflix Viewer Ratings with Machine Learning

StreamPulse is a full-stack data science project that predicts whether a Netflix user will give a movie a high rating (β‰₯4 stars), using demographic data and enriched movie metadata from the TMDB API. The project includes exploratory analysis, feature engineering, machine learning classification, and real-world API integration.


πŸ“Œ Problem Statement

Can we predict if a viewer will rate a movie highly based on their age group, gender, and the movie’s genre and metadata?

This has real-world applications in:

  • Personalized content recommendations
  • Audience engagement strategies
  • Viewer satisfaction prediction for streaming platforms

🧠 What This Project Includes

  • βœ… Merged multi-source datasets (ratings, users, movies)
  • πŸ“Š Exploratory data analysis (genre trends, age ratings, gender behavior)
  • 🧼 Feature engineering (age bucketing, encoding, timestamp processing)
  • 🌐 TMDB API integration (popularity, vote count, movie rating enrichment)
  • 🧠 XGBoost classifier to predict high ratings (β‰₯4 stars)
  • πŸ“ˆ Model evaluation: accuracy, precision, recall, confusion matrix

πŸ› οΈ Tools Used

Category Tools & Libraries
Programming Python (Pandas, NumPy, Matplotlib, Seaborn)
Machine Learning XGBoost, RandomForest, scikit-learn
Data Visualization Matplotlib, Seaborn
External APIs TMDB API
Data Sources MovieLens, Netflix Titles
Optional Add-ons Power BI / Tableau (for future dashboarding)

πŸ“Š Key Insights

  • 🎯 Genres like Action and Comedy are most popular among young adults.
  • πŸ‘₯ Females give slightly higher average ratings than males.
  • πŸ•° Viewer activity follows seasonal spikes (year-month patterns).
  • 🌟 TMDB metadata (popularity, average vote) improved model accuracy and interpretability.

πŸ€– Model Results

  • Final model: XGBoost Classifier
  • Target: Whether a rating is β‰₯ 4 (binary classification)
  • Accuracy: ~56%
  • Feature importance: Genre, TMDB popularity, age group

βœ… Even with limited features, the model shows strong business use for segmentation and personalization.


πŸ“‚ Project Structure

  • netflix_streampulse_full_project.ipynb – Complete notebook with data cleaning, EDA, modeling, and enrichment
  • data/raw/ – Original MovieLens and Netflix CSV files
  • data/processed/ – Cleaned and merged datasets (optional)
  • visuals/ – Saved plots and feature importance charts
  • dashboard/ – Power BI or Tableau dashboard (optional)
  • README.md – Project overview, summary, tools used, and key insights

πŸš€ Future Enhancements

  • πŸŽ₯ Add NLP analysis on movie titles or descriptions
  • πŸ“Š Build Power BI or Tableau dashboard for business presentation
  • 🎯 Incorporate user viewing history for better personalization
  • 🧠 Tune models further or use deep learning (e.g., LSTM on user sequences)

πŸ“Ž Credits


πŸ’Ό About Me

I'm currently pursuing my Master's in Data Science and passionate about building real-world machine learning projects that combine business insight, analytics, and storytelling.
Let’s connect on LinkedIn: www.linkedin.com/in/atharv-kadam β€” I’m actively looking for opportunities in data analytics, product analytics, or ML-based roles.

About

Full-stack machine learning project that predicts viewer satisfaction (high ratings) on Netflix using demographic data and TMDB movie metadata. Includes EDA, XGBoost modeling, and real-time enrichment using the TMDB API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published