This project implements a hybrid recommender system trained on the Amazon Reviews 2023 Digital Music dataset by McAuley Lab. It integrates content-based (TF-IDF + cosine), collaborative (item-based + SVD), and popularity-based (Bayesian weighted) recommendation techniques to produce personalized digital music recommendations.
The system provides an interactive Streamlit web interface that allows users to upload .jsonl review and metadata files, explore top-rated songs or albums, and adjust model weights dynamically to balance content, collaborative, and popularity signals.
Built for reproducible RecSys benchmarking, it features:
-
Fine-grained data cleaning and merging of reviews and metadata
-
TF-IDF text vectorization on titles, categories, and descriptions
-
Sparse cosine similarity and latent SVD embeddings for collaborative filtering
-
Weighted hybrid blending with user-adjustable sliders
-
Tabbed visualization for Popular, Content-based, Collaborative, and Hybrid recommendations
Excellent β Iβve reviewed your uploaded
Time_Series_Model.pyscript. Itβs a Streamlit-based hybrid recommender system (Content-based + Collaborative + Popularity + Hybrid blend) built for the Amazon Digital Music dataset (2023) from McAuley Lab.
A hybrid recommendation system for Amazon Digital Music Reviews (2023), built using Streamlit, TF-IDF, and Collaborative Filtering.
This project blends popularity, content-based, and collaborative approaches to generate personalized digital music recommendations.
This model uses the Amazon Reviews 2023 β Digital Music dataset from McAuley Lab:
| Feature | Description |
|---|---|
| Reviews | 571M+ user-item interactions (May 1996 β Sep 2023) |
| Metadata | Rich item info (title, category, description, etc.) |
| Granularity | Timestamps at the second-level |
| Splits | Standard RecSys train/test splits |
β
Load and clean .jsonl reviews and metadata
β
TFβIDF based content similarity model
β
Itemβitem collaborative filtering (cosine + SVD)
β
Popularity (Bayesian weighted mean) ranking
β
Hybrid weighted recommender (configurable weights)
β
Fully interactive Streamlit interface
β
User-controlled tuning sliders and data uploads
| Module | Technique | Description |
|---|---|---|
| Popularity | Bayesian average | Weighted rating using vote counts |
| Content-based | TF-IDF + Cosine similarity | Textual similarity on title + category + description |
| Collaborative | Sparse cosine NN + SVD | Itemβitem matrix factorization |
| Hybrid | Weighted fusion | Combines all scores via linear weights |
- Python 3.9+
- Streamlit for deployment/UI
- scikit-learn (TF-IDF, SVD, NearestNeighbors)
- pandas, NumPy, SciPy
- McAuley Lab Amazon Dataset (2023)
streamlit run Time_Series_Model.pyYou can:
- Use default file paths (if placed in
data/) - Or upload
.jsonlreview and meta files via the Streamlit sidebar
| Component | Default Weight |
|---|---|
| Content-based | 0.45 |
| Collaborative | 0.45 |
| Popularity | 0.10 |
| Tab | Description |
|---|---|
| Popular | Top-N Bayesian weighted items |
| Content-based | TF-IDF cosine recommendations |
| Collaborative | Itemβitem cosine + SVD latent factors |
| Hybrid | Final blended recommendations |
If you use this dataset or pipeline, please cite:
He, Ruining, McAuley, Julian. Amazon Product Data 2023 (McAuley Lab, UCSD). π Paper
.
βββ Time_Series_Model.py # Streamlit app
βββ requirements.txt
βββ data/ # Example .jsonl data
βββ assets/ # Screenshots or demo
βββ notebooks/ # Optional exploration
βββ models/ # Saved models (TF-IDF, SVD, etc.)
- β Integrate HuggingFace embeddings for better semantic similarity
- β Add session-based or sequence models (e.g., SASRec, BERT4Rec)
- π Deploy on Streamlit Cloud or HuggingFace Spaces
- π Add user profiling and A/B testing
This project is licensed under the MIT License.
Venkatesh Data Science | Machine Learning | Recommender Systems π§ venkateshvarada56@gmail.com π LinkedIn Profile
