StreamPulse is a full-stack data science project that predicts whether a Netflix user will give a movie a high rating (β₯4 stars), using demographic data and enriched movie metadata from the TMDB API. The project includes exploratory analysis, feature engineering, machine learning classification, and real-world API integration.
Can we predict if a viewer will rate a movie highly based on their age group, gender, and the movieβs genre and metadata?
This has real-world applications in:
- Personalized content recommendations
- Audience engagement strategies
- Viewer satisfaction prediction for streaming platforms
- β Merged multi-source datasets (ratings, users, movies)
- π Exploratory data analysis (genre trends, age ratings, gender behavior)
- π§Ό Feature engineering (age bucketing, encoding, timestamp processing)
- π TMDB API integration (popularity, vote count, movie rating enrichment)
- π§ XGBoost classifier to predict high ratings (β₯4 stars)
- π Model evaluation: accuracy, precision, recall, confusion matrix
Category | Tools & Libraries |
---|---|
Programming | Python (Pandas, NumPy, Matplotlib, Seaborn) |
Machine Learning | XGBoost, RandomForest, scikit-learn |
Data Visualization | Matplotlib, Seaborn |
External APIs | TMDB API |
Data Sources | MovieLens, Netflix Titles |
Optional Add-ons | Power BI / Tableau (for future dashboarding) |
- π― Genres like Action and Comedy are most popular among young adults.
- π₯ Females give slightly higher average ratings than males.
- π° Viewer activity follows seasonal spikes (year-month patterns).
- π TMDB metadata (popularity, average vote) improved model accuracy and interpretability.
- Final model: XGBoost Classifier
- Target: Whether a rating is β₯ 4 (binary classification)
- Accuracy: ~56%
- Feature importance: Genre, TMDB popularity, age group
β Even with limited features, the model shows strong business use for segmentation and personalization.
netflix_streampulse_full_project.ipynb
β Complete notebook with data cleaning, EDA, modeling, and enrichmentdata/raw/
β Original MovieLens and Netflix CSV filesdata/processed/
β Cleaned and merged datasets (optional)visuals/
β Saved plots and feature importance chartsdashboard/
β Power BI or Tableau dashboard (optional)README.md
β Project overview, summary, tools used, and key insights
- π₯ Add NLP analysis on movie titles or descriptions
- π Build Power BI or Tableau dashboard for business presentation
- π― Incorporate user viewing history for better personalization
- π§ Tune models further or use deep learning (e.g., LSTM on user sequences)
- Data: MovieLens Dataset, Netflix Titles, TMDB API
- Project by Atharv Kadam
I'm currently pursuing my Master's in Data Science and passionate about building real-world machine learning projects that combine business insight, analytics, and storytelling.
Letβs connect on LinkedIn: www.linkedin.com/in/atharv-kadam β Iβm actively looking for opportunities in data analytics, product analytics, or ML-based roles.