We are excited to share that the corresponding paper for this project has been accepted into the SIGIR 2025 Proceedings🎉🎉!
To get started, follow these steps:
First, install all necessary dependencies by using the requirements.txt
file:
pip install -r requirements.txt
Download the Netflix dataset node features and place them in the netflix_data
directory. We recommend following the instructions provided by LLMRec to obtain the node features.
Once the environment is set up and the dataset is prepared, you can run the project by executing:
python main.py
We perform experiments on three publicly available datasets, i.e., Netflix, MovieLens, and Amazon-book.
To ensure a fair comparison, we provide only textual information as side information for all methods.
For baselines that cannot directly utilize textual information, such as LightGCN [He et al., 2020], we use text encodings as node features.
We follow existing work to split the train, validation, and test sets. For the Netflix and MovieLens datasets, we use the same split as in LLMRec [Wei et al., 2024]; for the Amazon-book dataset, we follow the split from RLMRec [Ren et al., 2024].
-
Netflix dataset [Bennett et al., 2007]
The Netflix dataset is sourced from the Kaggle website.
We derive the interaction graphG
and user interaction historyL_U
from the users' viewing history.
For items, we combine the movie title, year, genre, and categories as textual information, denoted asT_V
, while for users, we use age, gender, favorite directors, country, and language as textual information, denoted asP_U
.
Additionally, we use BERT [Reimers and Gurevych, 2019] to encode the textual information of users and items, obtaining user featuresF_U
and item featuresM_V
, respectively. -
MovieLens dataset [Harper and Konstan, 2015]
The MovieLens dataset is sourced from ML-10M.
We obtain the interaction graphG
and user interaction historyL_U
from the users' viewing history.
The side information, including movie title, year, and genre, is combined as the item textual featureT_V
, while user age, gender, country, and language are combined as the user textual featureP_U
.
Additionally, we encode these textual information using BERT [Reimers and Gurevych, 2019] as featuresF_U
andM_V
. -
Amazon-book dataset [Ni et al., 2019]
The Amazon-book dataset contains book review records from 2000 to 2014.
We treat each review as a user-item interaction and derive the interaction graphG
and user interaction historyL_U
from the records.
We use the book title, year, and categories as the textual information for itemsT_V
, and the user's review content as the textual information for usersP_U
.
These information are encoded by BERT [Reimers and Gurevych, 2019] to obtain featuresF_U
andM_V
.
- He, Xiangnan, et al. "LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation." SIGIR 2020.
- Wei, Xinyao, et al. "LLMRec: Towards Open-Ended Recommendation with Large Language Models." arXiv 2024.
- Ren, Yujia, et al. "Representation Learning with Preference Graphs for Open-World Recommendation." WWW 2024.
- Bennett, James, and Stan Lanning. "The Netflix Prize." KDD Cup 2007.
- Harper, F. Maxwell, and Joseph A. Konstan. "The MovieLens Datasets: History and Context." ACM TiiS 2015.
- Ni, Junnan, et al. "Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects." EMNLP 2019.
- Reimers, Nils, and Iryna Gurevych. "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." EMNLP/IJCNLP 2019.
We present the pseudo code for model training in Algorithm 1.