在保存in _save_checkpoint 的时候,提示metric_value = metrics[metric_to_check] KeyError: 'eval_loss',如何解决。 #7816
Closed
1 task done
Labels
solved
This problem has been already solved
Reminder
System Info
llamafactory
version: 0.9.2Reproduction
我的报错是in _save_checkpoint metric_value = metrics[metric_to_check] KeyError: 'eval_loss'。
我的yaml文件是,### model
model_name_or_path: /llama/LLaMA-Factory-main/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 4
lora_alpha: 8
lora_dropout: 0.05
lora_target: "q_proj,v_proj"
gradient_checkpointing: true
dataset
dataset: identityz
template: deepseek
cutoff_len: 5120
max_samples: 100000
overwrite_cache: true
preprocessing_num_workers: 8
dataloader_num_workers: 2
output
output_dir: /llama/LLaMA-Factory-main/saves
logging_steps: 500
logging_strategy: "steps"
save_steps: 500
plot_loss: true
save_only_model: false
load_best_model_at_end: true
ddp_find_unused_parameters: false
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
learning_rate: 3.0e-5
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
ddp_timeout: 180000000
resume_from_checkpoint: /llama/LLaMA-Factory-main/saves/checkpoint-500
export_device: cpu
dataloader_pin_memory: true
auto_find_batch_size: true
eval
do_eval: true
eval_dataset: identityzz
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
我用的是指令式数据 。在保存的checkpoint的时候,提示in _save_checkpoint metric_value = metrics[metric_to_check] KeyError: 'eval_loss'。请问如何解决。
Others
No response
The text was updated successfully, but these errors were encountered: