Pulse · hiyouga/LLaMA-Factory · GitHub

May 6, 2025 – May 13, 2025

Overview

9 Active pull requests

66 Active issues

Could not load contribution data

Please try again later

6 Pull requests merged by 4 people

[assets] update windows installation
#8042 merged May 13, 2025
[model] add seed coder and qwen3 quant models
#8039 merged May 13, 2025
[data] fix kimi vl template
#8015 merged May 11, 2025
[infer] add video params for vllm infer
#7992 merged May 9, 2025
[data] Avoid repetitive tool description warp
#8000 merged May 9, 2025
[docs] add GraphGen
#7974 merged May 7, 2025

3 Pull requests opened by 3 people

Update llama3_lora_sft.yaml
#7988 opened May 8, 2025
feat: add smollm support
#8050 opened May 13, 2025
[infer] Modify vllm_infer.py to batch preprocess to avoid too much files opened error
#8051 opened May 14, 2025

43 Issues closed by 9 people

运行llamafactory-cli, log的info太啰嗦, 干扰debug, 如何屏蔽?
#8052 closed May 14, 2025
【报错】[rank2]: max_cache_length = past_key_values.get_max_length() [rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank2]: AttributeError: 'DynamicCache' object has no attribute 'get_max_length'. Did you mean: 'get_seq_length'?
#8048 closed May 14, 2025
是否支持qwen moe 进行lora微调
#8047 closed May 13, 2025
指定的template 不生效？
#8044 closed May 13, 2025
请问是否支持qwen3 moe 进行grpo训练
#8040 closed May 13, 2025
SFT Loss Exhibits Step Function Behavior evey Epoch, Not Monotonic Decrease
#8037 closed May 12, 2025
Qwen3-14B训练过程中出现torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
#7967 closed May 12, 2025
使用vllm推理Kimi_VL时出现RuntimeError: Engine core initialization failed. See root cause above
#8025 closed May 12, 2025
训练gemma3报错
#7370 closed May 12, 2025
进行模型量化导出没有后续输出
#8030 closed May 12, 2025
做文本分类训练，训练后模型返回的结果会有不是训练的内容，有什么办法控制吗？
#8032 closed May 12, 2025
lora微调Kimk_vl合并后，vllm推理出现RuntimeError: Engine core initialization failed. See root cause above
#8026 closed May 12, 2025
每次保存模型global保存的内存占的太多，有什么办法可以减少保存模型的内存吗，没用的就不保存
#8027 closed May 12, 2025
qwen3原始模型或lora合并参数后的模型结果输出问题
#8024 closed May 12, 2025
使用vllm推理Kimi_vl时提示ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
#8022 closed May 12, 2025
I am facing issue while using custom system msg
#8017 closed May 11, 2025
训练成功后chat加载聊天报错
#8016 closed May 11, 2025
docker部署跑测试有问题
#8010 closed May 11, 2025
Kimi-VL模型推理时UnboundLocalError: cannot access local variable 'mm_inputs' where it is not associated with a value
#8008 closed May 11, 2025
Qwen3 not stopping generation after lora finetuning
#7943 closed May 11, 2025
ValueError: Processor was not found, please check and update your model file.
#8014 closed May 11, 2025
qwen38B模型部署
#8011 closed May 11, 2025
ValueError: Processor was not found, please check and update your model file.
#8009 closed May 11, 2025
qwen2.5VL-7B lora微调问题咨询
#7991 closed May 10, 2025
How to resize_token_embeddings of Qwen2.5-Omni-7B?
#8005 closed May 10, 2025
llama框架所需的transformer版本与qwen所需要的transformer版本冲突问题
#8006 closed May 10, 2025
如何改代码使训练的时候标签(labels)部分不分词, 问题部分分词？
#7980 closed May 9, 2025
flash_attn: fa2 并没有改变模型的attn_implementation参数。
#7979 closed May 9, 2025
mllm_demo.json用的格式是shareGPT还是alpaca?
#7982 closed May 9, 2025
Gemma 3 FSDP PT [ValueError: Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float32]
#7993 closed May 9, 2025
关于添加 add_tokens 导致Lora参数和模型参数不匹配
#7997 closed May 9, 2025
lora微调时使用bf16，但是合并lora之后输出fp32，导致flashattn不可用
#7998 closed May 9, 2025
安装LLaMA Factory依赖的依赖hf-xet时出错
#7999 closed May 9, 2025
TypeError: MMPluginMixin._get_video_sample_indices() missing 2 required positional arguments: 'video_fps' and 'video_maxlen'
#7990 closed May 9, 2025
单卡4080 或者双卡4060ti 使用docker 微调损失函数无法收敛
#8002 closed May 9, 2025
chat 功能用hugginface框架不能加载模型
#8001 closed May 9, 2025
使用deepseed训练ppo阶段时候出现报错
#7989 closed May 9, 2025
请问qwen3的微调支持图像输入吗？
#7994 closed May 9, 2025
微调Kimi时出现OSError: Failed to load tokenizer.
#7984 closed May 8, 2025
Qwen2.5-Omni-3B使用BF16训练报错
#7970 closed May 8, 2025
Qwen3-4B lora微调合并后用vllm推理仍然输出thinking过程
#7978 closed May 7, 2025
qwen3训练数据格式问题
#7977 closed May 7, 2025
直接下载main的代码，不支持Qwen3
#7972 closed May 7, 2025

23 Issues opened by 22 people

Support for SmolVLM
#8049 opened May 13, 2025
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([128259, 4096]) from checkpoint, the shape in current model is torch.Size([128264, 4096]).
#8046 opened May 13, 2025
NCCL INFO socketPollConnect: connect returned No route to host
#8045 opened May 13, 2025
DataLoader exited unexpectedly after one epoch
#8043 opened May 13, 2025
[question] Dose Muon Optimizer support deepspeed ?
#8041 opened May 13, 2025
qwen3:30b-a3b SFT with lora-rank as 16 is very very slow
#8038 opened May 13, 2025
Single-machine dual-GPU fine-tuning environment based on WSL2 Ubuntu,Although the GPU memory is sufficient, the system reports insufficient GPU memory.
#8036 opened May 12, 2025
invalid conversion from ‘int’ to ‘CUresult’ {aka ‘cudaError_enum’}
#8035 opened May 12, 2025
训练模型出错
#8031 opened May 12, 2025
Discrepancy in ROUGE BLEU Scores During Training vs. Final Results
#8029 opened May 12, 2025
模型保存问题
#8028 opened May 12, 2025
unsloth train error
#8021 opened May 12, 2025
[PPO] failed with reward model in PPO training for llama3-8B, huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'saves/llama3-8b/lora/reward'. Use `repo_type` argument if needed.
#8020 opened May 12, 2025
4080 单卡微调使用bf16 第二轮之后会的损失函数变成0
#8019 opened May 12, 2025
请问有计划支持4月底最新发布的kimi-audio吗
#8013 opened May 11, 2025
The performance has deteriorated significantly after fine-tuning with TRL but increase while using llama-factory
#8012 opened May 11, 2025
[eval]在模型评测时支持自定义系统提示词
#8007 opened May 10, 2025
Why does the loss suddenly drop to zero and the grad_norm stabilize at 1 at specific training steps when fine-tuning Llama3.2-vision-11B?
#8004 opened May 9, 2025
[FSDP LORA] No device index is set for xpu when running FSDP+LORA
#8003 opened May 9, 2025
transformers高于 4.49 的版本对 Ascend NPU 不兼容
#7986 opened May 8, 2025
cutoff_len 语义混淆
#7985 opened May 8, 2025
sft train code pending
#7983 opened May 8, 2025
How to save the tokenized data?
#7981 opened May 7, 2025

27 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Distributed improvement of Muon implementation
#7808 commented on May 13, 2025 • 1 new comment
support repetition_penalty and align presence_penalty with OpenAI Client
#7958 commented on May 7, 2025 • 0 new comments
add Sequence Parallelism (reopened #6506)
#7338 commented on May 8, 2025 • 0 new comments
This change adds support for Intel Gaudi HPUs.
#7275 commented on May 13, 2025 • 0 new comments
使用vllm_infer一次性推理多个样本会报错
#7513 commented on May 13, 2025 • 0 new comments
🚨FAQs | 常见问题🚨
#4614 commented on May 13, 2025 • 0 new comments
使用huggingface加载Qwen2.5-32B-Instruct 回答乱码
#7371 commented on May 13, 2025 • 0 new comments
streaming 训练卡在第一个step
#7261 commented on May 13, 2025 • 0 new comments
有计划支持LoRAMoE吗？
#2749 commented on May 13, 2025 • 0 new comments
310P 微调报错 RuntimeError: call aclnnCast failed, detail:EZ9999: Inner Error
#5425 commented on May 13, 2025 • 0 new comments
Can't Finetune Qwen2.5-VL-3B-Instruct-AWQ Using LoRA and Multiple GPUs
#7963 commented on May 12, 2025 • 0 new comments
单机单卡4090，lora微调deepseek-r1-distill-1.5B卡死，稳定复现，会导致整个机器卡死，nvdia-smi命令失效，
#7661 commented on May 12, 2025 • 0 new comments
Support for images_root field for multimodal training data
#7814 commented on May 12, 2025 • 0 new comments
deepseek-moe-16B预训练问题
#7165 commented on May 12, 2025 • 0 new comments
Why Speed per iteration slower when dataset is large
#6410 commented on May 12, 2025 • 0 new comments
多模态数据集多的时候，数据加载失败
#7266 commented on May 12, 2025 • 0 new comments
使用neat_packing进行sft训练，模型性能指标下降明显
#5426 commented on May 12, 2025 • 0 new comments
ChatGLM3-6b在阿里云上使用lora微调模型后，训练完成后，点击chat加载模型出现以西报错
#5995 commented on May 12, 2025 • 0 new comments
RuntimeError: CUDA driver error: invalid argument
#7832 commented on May 11, 2025 • 0 new comments
simpo算法实现肯定存在bug，可以稳定复现
#7954 commented on May 10, 2025 • 0 new comments
LLava Series (7B, 14B) freeze_vision_tower=false bug
#6376 commented on May 10, 2025 • 0 new comments
Tuned Model's processor can't tokenize my text input properly.
#7608 commented on May 10, 2025 • 0 new comments
How to implement weight decay towards the pre-trained model?
#5603 commented on May 9, 2025 • 0 new comments
请问有使用bitsandbytes量化方法的环境配置分享吗
#7861 commented on May 9, 2025 • 0 new comments
Is it possible to support microsoft/bitnet-b1.58-2B-4T ?
#7775 commented on May 8, 2025 • 0 new comments
Size mismatch error in the middle of training
#6699 commented on May 8, 2025 • 0 new comments
昇腾多机训练怎么设置
#5360 commented on May 8, 2025 • 0 new comments