关于添加 add_tokens 导致Lora参数和模型参数不匹配 #7997

npuNancy · 2025-05-09T04:09:16Z

Reminder

I have read the above rules and searched the existing issues.

System Info

llamafactory version: 0.9.3.dev0
Platform: Linux-6.8.0-57-generic-x86_64-with-glibc2.35
Python version: 3.11.0
PyTorch version: 2.5.1+cu124 (GPU)
Transformers version: 4.49.0.dev0
Datasets version: 3.2.0
Accelerate version: 1.2.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA A100-SXM4-80GB
GPU number: 8
GPU memory: 79.15GB
DeepSpeed version: 0.16.5
vLLM version: 0.7.3

Reproduction

问题描述
在lora sft训练中，如果使用add_tokens参数，则会出现：

adapter_config.json: "modules_to_save": ["lm_head","embed_tokens"]
vLLM Model+Lora推理时，vLLM输出：不支持设置modules_to_save。ValueError: vLLM only supports modules_to_save being None.
使用llama factory推理或合并Lora时，Lora的embed_tokens和lm_head两层的size和模型本身不一致

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:                                                    
        size mismatch for base_model.model.model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([128320, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
        size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([128320, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).

不使用add_tokens参数则一切正常。

在之前版本使用special_token的时候也一切正常。

#4139
相关问题：

另附推理配置文件

### model
model_name_or_path: /data4/Models/modelscope/hub/LLM-Research/Meta-Llama-3___1-8B-Instruct
trust_remote_code: true

### method
stage: sft
do_train: true 
finetuning_type: lora
lora_rank: 8
lora_alpha: 16
lora_target: all

# 要添加到 tokenizer 中的 special token。多个 special token 用逗号分隔。
add_tokens: "<Role>,</Role>,<Task>,</Task>,<Format>,</Format>,<Rules>,</Rules>,<choices>,</choices>"


### dataset
dataset: xxxxx
template: llama3
cutoff_len: 10000 
max_samples: 1000000 
overwrite_cache: true 
preprocessing_num_workers: 16 

### output
output_dir: saves/stage_3_llama_31_8B/lora/
logging_steps: 100
save_steps: 1000
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2 # 梯度积累步数
learning_rate: 1.0e-4
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.01
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
# predict_with_generate: true ## 会调用 metrics 计算相似度
# load_best_model_at_end: true # 是否在训练结束后加载最佳模型。

Others

No response

The text was updated successfully, but these errors were encountered:

hiyouga · 2025-05-09T13:26:07Z

合并时候也需要 add_tokens 参数

npuNancy added bug Something isn't working pending This problem is yet to be addressed labels May 9, 2025

hiyouga closed this as completed May 9, 2025

hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

关于添加 add_tokens 导致Lora参数和模型参数不匹配 #7997

关于添加 add_tokens 导致Lora参数和模型参数不匹配 #7997

npuNancy commented May 9, 2025 •

edited

Loading

hiyouga commented May 9, 2025

Uh oh!

关于添加 add_tokens 导致Lora参数和模型参数不匹配 #7997

关于添加 add_tokens 导致Lora参数和模型参数不匹配 #7997

Comments

npuNancy commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reminder

System Info

Reproduction

Others

hiyouga commented May 9, 2025

Uh oh!

npuNancy commented May 9, 2025 •

edited

Loading