This is our implementation for the paper:
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu and Tat-Seng Chua (2017). Neural Collaborative Filtering. In Proceedings of WWW '17, Perth, Australia, April 03-07, 2017.
Three collaborative filtering models: Generalized Matrix Factorization (GMF), Multi-Layer Perceptron (MLP), and Neural Matrix Factorization (NeuMF). To target the models for implicit feedback and ranking task, we optimize them using log loss with negative sampling.
Please cite our WWW'17 paper if you use our codes. Thanks!
Author: Dr. Xiangnan He (http://www.comp.nus.edu.sg/~xiangnan/)
We use Keras with Theano as the backend.
- Python version: 3.10.16
- Keras version: '3.9.2'
The instructions on how to run are the same as the original repo. All I have done is update the models to run on updated Keras and Python, as well as cleaning up a few unused imports.
If you are running these locally, I recommend lowering the epochs to 1 in order to increase the speed. I ran all of these in my anaconda3 virtual env. I have also uploaded my ipynb notebook if you would like to go through that.
I got frusterated with the commit histroy as well b/c I had my datafiles in that were not getting cleared out in cache so I just squashed and merged everything. The main contribution was making the files runable in an updated Python.
The instruction of commands has been clearly stated in the codes (see the parse_args function).
Run GMF:
python GMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --regs [0,0] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1
Run MLP:
python MLP.py --dataset ml-1m --epochs 20 --batch_size 256 --layers [64,32,16,8] --reg_layers [0,0,0,0] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1
Run NeuMF (without pre-training):
python NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --reg_mf 0 --reg_layers [0,0,0,0] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1
Run NeuMF (with pre-training):
python NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5
Note on tuning NeuMF: our experience is that for small predictive factors, running NeuMF without pre-training can achieve better performance than GMF and MLP. For large predictive factors, pre-training NeuMF can yield better performance (may need tune regularization for GMF and MLP).
Docker quickstart guide can be used for evaluating models quickly.
Install Docker Engine
Build a keras-theano docker image
docker build --no-cache=true -t ncf-keras-theano .
Run the docker image with a volume (Run GMF):
docker run --volume=$(pwd):/home ncf-keras-theano python GMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --regs [0,0] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1
Run the docker image with a volume (Run MLP):
docker run --volume=$(pwd):/home ncf-keras-theano python MLP.py --dataset ml-1m --epochs 20 --batch_size 256 --layers [64,32,16,8] --reg_layers [0,0,0,0] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1
Run the docker image with a volume (Run NeuMF -without pre-training):
docker run --volume=$(pwd):/home ncf-keras-theano python NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --reg_mf 0 --reg_layers [0,0,0,0] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1
Run the docker image with a volume (Run NeuMF -with pre-training):
docker run --volume=$(pwd):/home ncf-keras-theano python NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5
- Note: If you are using
zsh
and get an error likezsh: no matches found: [64,32,16,8]
, should usesingle quotation marks
for array parameters like--layers '[64,32,16,8]'
.
We provide two processed datasets: MovieLens 1 Million (ml-1m) and Pinterest (pinterest-20).
train.rating:
- Train file.
- Each Line is a training instance: userID\t itemID\t rating\t timestamp (if have)
test.rating:
- Test file (positive instances).
- Each Line is a testing instance: userID\t itemID\t rating\t timestamp (if have)
test.negative
- Test file (negative instances).
- Each line corresponds to the line of test.rating, containing 99 negative samples.
- Each line is in the format: (userID,itemID)\t negativeItemID1\t negativeItemID2 ...
Last Update Date: December 23, 2018