Skip to content

请问qwen3的微调支持图像输入吗? #7994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
MissShihongHowRU opened this issue May 9, 2025 · 1 comment
Closed
1 task done

请问qwen3的微调支持图像输入吗? #7994

MissShihongHowRU opened this issue May 9, 2025 · 1 comment
Labels
solved This problem has been already solved

Comments

@MissShihongHowRU
Copy link

MissShihongHowRU commented May 9, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

请问qwen3的微调支持图像输入吗?

我使用的lora训练配置文件如下:

### model
model_name_or_path: Qwen/Qwen3-4B
image_min_pixels: 200704
image_max_pixels: 1003520

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
freeze_vision_tower: false
freeze_multi_modal_projector: false
freeze_language_model: false

### dataset
dataset: xxx_grounded_classifying_patch_500k  # video: mllm_video_demo
template: qwen3
cutoff_len: 2048
max_samples: 600000
overwrite_cache: False
preprocessing_num_workers: 64

### output
output_dir: saves/qwen3_4b_xxx/lora/sft_grounded_classifying_patch_500k
logging_steps: 10
save_steps: 1000
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 50000

我的数据集长这样:

{
        "messages": [
            {
                "content": "<image>Please locate the object within the region <|box_start|>(161, 160), (717, 653)<|box_end|> in this image and identify the following attributes of it: whether the vehicle's front/rear is visible without obstruction and the type of the vehicle. Output the bbox coordinates using JSON format.",
                "role": "user"
            },
            {
                "content": "```json\n[\n    {\"bbox_2d\": [161, 160, 717, 653], \"label\": \"vehicle\", \"vehicle_type\": \"van\", \"fronttail_visibility\": \"visible\"}\n]\n```",
                "role": "assistant"
            }
        ],
        "images": [
            "/share/dataset/lll/xxx_grounded_classifying_patch_500k/Car102_20230823_153518_1692776118_0.png"
        ]
 }

以下是我遇到的报错:
Running tokenizer on dataset (num_proc=64): 0%| | 0/571730 [00:06<?, ? examples/s]
[rank0]: multiprocess.pool.RemoteTraceback:
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/boden/miniconda3/envs/llama_factory_lcg/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]: result = (True, func(*args, **kwds))
[rank0]: File "/home/boden/miniconda3/envs/llama_factory_lcg/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 678, in _write_generator_to_queue
[rank0]: for i, result in enumerate(func(**kwargs)):
[rank0]: File "/home/boden/miniconda3/envs/llama_factory_lcg/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3458, in _map_single
[rank0]: batch = apply_function_on_filtered_inputs(
[rank0]: File "/home/boden/miniconda3/envs/llama_factory_lcg/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3320, in apply_function_on_filtered_inputs
[rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 99, in preprocess_dataset
[rank0]: input_ids, labels = self._encode_data_example(
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 43, in _encode_data_example
[rank0]: messages = self.template.mm_plugin.process_messages(prompt + response, images, videos, audios, self.processor)
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 395, in process_messages
[rank0]: self._validate_input(processor, images, videos, audios)
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 155, in _validate_input
[rank0]: raise ValueError(
[rank0]: ValueError: This model does not support image input. Please check whether the correct template is used.
[rank0]: """

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
[rank0]: launch()
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank0]: run_exp()
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/train/tuner.py", line 110, in run_exp
[rank0]: _training_function(config={"args": args, "callbacks": callbacks})
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/train/tuner.py", line 72, in _training_function
[rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 51, in run_sft
[rank0]: dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/data/loader.py", line 315, in get_dataset
[rank0]: dataset = _get_preprocessed_dataset(
[rank0]: File "/home/boden/Dev/lcg/LLaMA-Factory/src/llamafactory/data/loader.py", line 256, in _get_preprocessed_dataset
[rank0]: dataset = dataset.map(
[rank0]: File "/home/boden/miniconda3/envs/llama_factory_lcg/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 560, in wrapper
[rank0]: out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
[rank0]: File "/home/boden/miniconda3/envs/llama_factory_lcg/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3147, in map
[rank0]: for rank, done, content in iflatmap_unordered(
[rank0]: File "/home/boden/miniconda3/envs/llama_factory_lcg/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 718, in iflatmap_unordered
[rank0]: [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]: File "/home/boden/miniconda3/envs/llama_factory_lcg/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 718, in
[rank0]: [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]: File "/home/boden/miniconda3/envs/llama_factory_lcg/lib/python3.10/site-packages/multiprocess/pool.py", line 774, in get
[rank0]: raise self._value
[rank0]: ValueError: This model does not support image input. Please check whether the correct template is used.
[rank0]:[W509 10:36:53.245591738 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Reproduction

Put your message here.

Others

No response

@MissShihongHowRU MissShihongHowRU added bug Something isn't working pending This problem is yet to be addressed labels May 9, 2025
@Kuangdd01
Copy link
Collaborator

qwen3为纯文本模型,不支持图片输入

@Kuangdd01 Kuangdd01 added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants