Skip to content

关于添加 add_tokens 导致Lora参数和模型参数不匹配 #7997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
npuNancy opened this issue May 9, 2025 · 1 comment
Closed
1 task done

关于添加 add_tokens 导致Lora参数和模型参数不匹配 #7997

npuNancy opened this issue May 9, 2025 · 1 comment
Labels
solved This problem has been already solved

Comments

@npuNancy
Copy link

npuNancy commented May 9, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.3.dev0
  • Platform: Linux-6.8.0-57-generic-x86_64-with-glibc2.35
  • Python version: 3.11.0
  • PyTorch version: 2.5.1+cu124 (GPU)
  • Transformers version: 4.49.0.dev0
  • Datasets version: 3.2.0
  • Accelerate version: 1.2.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A100-SXM4-80GB
  • GPU number: 8
  • GPU memory: 79.15GB
  • DeepSpeed version: 0.16.5
  • vLLM version: 0.7.3

Reproduction

问题描述
在lora sft训练中,如果使用add_tokens参数,则会出现:

  1. adapter_config.json: "modules_to_save": ["lm_head","embed_tokens"]
  2. vLLM Model+Lora推理时,vLLM输出:不支持设置modules_to_save。ValueError: vLLM only supports modules_to_save being None.
  3. 使用llama factory推理或合并Lora时,Lora的embed_tokens和lm_head两层的size和模型本身不一致
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:                                                    
        size mismatch for base_model.model.model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([128320, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
        size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([128320, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).

不使用add_tokens参数则一切正常。

在之前版本使用special_token的时候也一切正常。

#4139
相关问题:

另附推理配置文件

### model
model_name_or_path: /data4/Models/modelscope/hub/LLM-Research/Meta-Llama-3___1-8B-Instruct
trust_remote_code: true

### method
stage: sft
do_train: true 
finetuning_type: lora
lora_rank: 8
lora_alpha: 16
lora_target: all

# 要添加到 tokenizer 中的 special token。多个 special token 用逗号分隔。
add_tokens: "<Role>,</Role>,<Task>,</Task>,<Format>,</Format>,<Rules>,</Rules>,<choices>,</choices>"


### dataset
dataset: xxxxx
template: llama3
cutoff_len: 10000 
max_samples: 1000000 
overwrite_cache: true 
preprocessing_num_workers: 16 

### output
output_dir: saves/stage_3_llama_31_8B/lora/
logging_steps: 100
save_steps: 1000
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2 # 梯度积累步数
learning_rate: 1.0e-4
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.01
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
# predict_with_generate: true ## 会调用 metrics 计算相似度
# load_best_model_at_end: true # 是否在训练结束后加载最佳模型。

Others

No response

@npuNancy npuNancy added bug Something isn't working pending This problem is yet to be addressed labels May 9, 2025
@hiyouga
Copy link
Owner

hiyouga commented May 9, 2025

合并时候也需要 add_tokens 参数

@hiyouga hiyouga closed this as completed May 9, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants