Yelp数据集报错 #1264
Unanswered
kuzma-long
asked this question in
Q&A
Yelp数据集报错
#1264
Replies: 2 comments 1 reply
-
|
@kuzma-long 您好,请检查数据集的格式是否存在错误,即文件的格式是否符合 原子文件 。推荐你使用我们提供的已经处理好的yelp数据集,在此数据格式下,sequential模型示例配置文件为: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
配置:
General Hyper Parameters:
gpu_id = 3
use_gpu = True
seed = 2020
state = INFO
reproducibility = True
data_path = dataset/yelp
checkpoint_dir = saved
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False
Training Hyper Parameters:
epochs = 300
train_batch_size = 128
learner = adam
learning_rate = 0.001
neg_sampling = None
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4
Evaluation Hyper Parameters:
eval_args = {'split': {'LS': 'valid_and_test'}, 'order': 'TO', 'group_by': 'user', 'mode': 'full'}
repeatable = True
metrics = ['Recall', 'NDCG', 'MRR']
topk = [5, 10, 20, 50]
valid_metric = NDCG@10
valid_metric_bigger = True
eval_batch_size = 256
metric_decimal_place = 4
Dataset Hyper Parameters:
field_separator =
seq_separator =
USER_ID_FIELD = user_id
ITEM_ID_FIELD = business_id
RATING_FIELD = stars
TIME_FIELD = timestamp
seq_len = None
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = None
unload_col = None
unused_col = None
additional_feat_suffix = None
rm_dup_inter = None
val_interval = {'stars': '[3,inf)'}
filter_inter_by_user_or_item = True
user_inter_num_interval = [15,inf)
item_inter_num_interval = [15,inf)
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = None
normalize_field = None
normalize_all = None
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id
benchmark_filename = None
Other Hyper Parameters:
wandb_project = recbole
require_pow = False
MODEL_TYPE = ModelType.SEQUENTIAL
n_layers = [2, 4, 6, 8]
hidden_size = [32, 64]
hidden_dropout_prob = [0.2, 0.4, 0.6, 0.8]
hidden_act = gelu
layer_norm_eps = 1e-12
initializer_range = 0.02
loss_type = CE
pooling_mode = mean
hyper_parameters = ['n_layers', 'hidden_size', 'hidden_dropout_prob']
reg_weight = 0.0001
ssl_temp = 0.05
ssl_reg = 1e-06
alpha = 1.5
proto_reg = 1e-07
num_clusters = 2000
MODEL_INPUT_TYPE = InputType.POINTWISE
eval_type = EvaluatorType.RANKING
device = cuda
train_neg_sample_args = {'strategy': 'none'}
eval_neg_sample_args = {'strategy': 'full', 'distribution': 'uniform'}
报错:
Traceback (most recent call last):
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 760, in _next_iter_line
line = next(self.data)
_csv.Error: ' ' expected after '"'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 125, in
run_single_model(args)
File "main.py", line 49, in run_single_model
dataset = create_dataset(config)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/utils.py", line 68, in create_dataset
dataset = dataset_class(config)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/sequential_dataset.py", line 36, in init
super().init(config)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 96, in init
self._from_scratch()
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 106, in _from_scratch
self._load_data(self.dataset_name, self.dataset_path)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 249, in _load_data
self.item_feat = self._load_user_or_item_feat(token, dataset_path, FeatureSource.ITEM, 'iid_field')
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 311, in _load_user_or_item_feat
feat = self._load_feat(feat_path, source)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/recbole/data/dataset/dataset.py", line 438, in _load_feat
df = pd.read_csv(
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 581, in _read
return parser.read(nrows)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1254, in read
index, columns, col_dict = self._engine.read(nrows)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 238, in read
content = self._get_lines(rows)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 1091, in _get_lines
new_row = self._next_iter_line(row_num=self.pos + rows + 1)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 789, in _next_iter_line
self._alert_malformed(msg, row_num)
File "/home/chaolong/.conda/envs/recbole/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 739, in _alert_malformed
raise ParserError(msg)
pandas.errors.ParserError: ' ' expected after '"'
在使用yelp数据集时,在数据预处理阶段报该错误,请问一下是哪里出了问题?
Beta Was this translation helpful? Give feedback.
All reactions