-
Couldn't load subscription status.
- Fork 6
Open
Description
最近尝试您的数据集的时候,一直报错。
pip show datasets
[INFO|2025-03-25 02:48:08] llamafactory.data.loader:143 >> Loading dataset BUAADreamer/llava-med-zh-instruct-60k...
Loading dataset shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 47.21it/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/56649 [00:00<?, ? examples/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/cheng/.local/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/cheng/.local/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 678, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
File "/home/cheng/.local/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3476, in _map_single
batch = apply_function_on_filtered_inputs(
File "/home/cheng/.local/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3338, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/home/cheng/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 99, in preprocess_dataset
input_ids, labels = self._encode_data_example(
File "/home/cheng/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 43, in _encode_data_example
messages = self.template.mm_plugin.process_messages(prompt + response, images, videos, audios, self.processor)
File "/home/cheng/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 484, in process_messages
image_seqlen = (height // processor.patch_size) * (
TypeError: unsupported operand type(s) for //: 'int' and 'NoneType'
"""
Metadata
Metadata
Assignees
Labels
No labels