-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Insights: hiyouga/LLaMA-Factory
Overview
Could not load contribution data
Please try again later
6 Pull requests merged by 4 people
-
[assets] update windows installation
#8042 merged
May 13, 2025 -
[model] add seed coder and qwen3 quant models
#8039 merged
May 13, 2025 -
[data] fix kimi vl template
#8015 merged
May 11, 2025 -
[infer] add video params for vllm infer
#7992 merged
May 9, 2025 -
[data] Avoid repetitive tool description warp
#8000 merged
May 9, 2025 -
[docs] add GraphGen
#7974 merged
May 7, 2025
3 Pull requests opened by 3 people
-
Update llama3_lora_sft.yaml
#7988 opened
May 8, 2025 -
feat: add smollm support
#8050 opened
May 13, 2025 -
[infer] Modify vllm_infer.py to batch preprocess to avoid too much files opened error
#8051 opened
May 14, 2025
43 Issues closed by 9 people
-
运行llamafactory-cli, log的info太啰嗦, 干扰debug, 如何屏蔽?
#8052 closed
May 14, 2025 -
是否支持qwen moe 进行lora微调
#8047 closed
May 13, 2025 -
指定的template 不生效?
#8044 closed
May 13, 2025 -
请问是否支持qwen3 moe 进行grpo训练
#8040 closed
May 13, 2025 -
SFT Loss Exhibits Step Function Behavior evey Epoch, Not Monotonic Decrease
#8037 closed
May 12, 2025 -
Qwen3-14B训练过程中出现torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
#7967 closed
May 12, 2025 -
使用vllm推理Kimi_VL时出现RuntimeError: Engine core initialization failed. See root cause above
#8025 closed
May 12, 2025 -
训练gemma3报错
#7370 closed
May 12, 2025 -
进行模型量化导出没有后续输出
#8030 closed
May 12, 2025 -
做文本分类训练,训练后模型返回的结果会有不是训练的内容,有什么办法控制吗?
#8032 closed
May 12, 2025 -
lora微调Kimk_vl合并后,vllm推理出现RuntimeError: Engine core initialization failed. See root cause above
#8026 closed
May 12, 2025 -
每次保存模型global保存的内存占的太多,有什么办法可以减少保存模型的内存吗,没用的就不保存
#8027 closed
May 12, 2025 -
qwen3原始模型或lora合并参数后的模型结果输出问题
#8024 closed
May 12, 2025 -
使用vllm推理Kimi_vl时提示ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
#8022 closed
May 12, 2025 -
I am facing issue while using custom system msg
#8017 closed
May 11, 2025 -
训练成功后chat加载聊天报错
#8016 closed
May 11, 2025 -
docker部署跑测试有问题
#8010 closed
May 11, 2025 -
Qwen3 not stopping generation after lora finetuning
#7943 closed
May 11, 2025 -
ValueError: Processor was not found, please check and update your model file.
#8014 closed
May 11, 2025 -
qwen38B模型部署
#8011 closed
May 11, 2025 -
ValueError: Processor was not found, please check and update your model file.
#8009 closed
May 11, 2025 -
qwen2.5VL-7B lora微调问题咨询
#7991 closed
May 10, 2025 -
How to resize_token_embeddings of Qwen2.5-Omni-7B?
#8005 closed
May 10, 2025 -
llama框架所需的transformer版本与qwen所需要的transformer版本冲突问题
#8006 closed
May 10, 2025 -
如何改代码使训练的时候标签(labels)部分不分词, 问题部分分词?
#7980 closed
May 9, 2025 -
flash_attn: fa2 并没有改变模型的attn_implementation参数。
#7979 closed
May 9, 2025 -
mllm_demo.json用的格式是shareGPT还是alpaca?
#7982 closed
May 9, 2025 -
关于添加 add_tokens 导致Lora参数和模型参数不匹配
#7997 closed
May 9, 2025 -
lora微调时使用bf16,但是合并lora之后输出fp32,导致flashattn不可用
#7998 closed
May 9, 2025 -
安装LLaMA Factory依赖的依赖hf-xet时出错
#7999 closed
May 9, 2025 -
单卡4080 或者双卡4060ti 使用docker 微调损失函数无法收敛
#8002 closed
May 9, 2025 -
chat 功能用hugginface框架不能加载模型
#8001 closed
May 9, 2025 -
使用deepseed训练ppo阶段时候出现报错
#7989 closed
May 9, 2025 -
请问qwen3的微调支持图像输入吗?
#7994 closed
May 9, 2025 -
微调Kimi时出现OSError: Failed to load tokenizer.
#7984 closed
May 8, 2025 -
Qwen2.5-Omni-3B使用BF16训练报错
#7970 closed
May 8, 2025 -
Qwen3-4B lora微调合并后用vllm推理仍然输出thinking过程
#7978 closed
May 7, 2025 -
qwen3训练数据格式问题
#7977 closed
May 7, 2025 -
直接下载main的代码,不支持Qwen3
#7972 closed
May 7, 2025
23 Issues opened by 22 people
-
Support for SmolVLM
#8049 opened
May 13, 2025 -
NCCL INFO socketPollConnect: connect returned No route to host
#8045 opened
May 13, 2025 -
DataLoader exited unexpectedly after one epoch
#8043 opened
May 13, 2025 -
[question] Dose Muon Optimizer support deepspeed ?
#8041 opened
May 13, 2025 -
qwen3:30b-a3b SFT with lora-rank as 16 is very very slow
#8038 opened
May 13, 2025 -
invalid conversion from ‘int’ to ‘CUresult’ {aka ‘cudaError_enum’}
#8035 opened
May 12, 2025 -
训练模型出错
#8031 opened
May 12, 2025 -
Discrepancy in ROUGE BLEU Scores During Training vs. Final Results
#8029 opened
May 12, 2025 -
模型保存问题
#8028 opened
May 12, 2025 -
unsloth train error
#8021 opened
May 12, 2025 -
4080 单卡微调 使用bf16 第二轮之后会的损失函数变成0
#8019 opened
May 12, 2025 -
请问有计划支持4月底最新发布的kimi-audio吗
#8013 opened
May 11, 2025 -
[eval]在模型评测时支持自定义系统提示词
#8007 opened
May 10, 2025 -
[FSDP LORA] No device index is set for xpu when running FSDP+LORA
#8003 opened
May 9, 2025 -
transformers高于 4.49 的版本对 Ascend NPU 不兼容
#7986 opened
May 8, 2025 -
cutoff_len 语义混淆
#7985 opened
May 8, 2025 -
sft train code pending
#7983 opened
May 8, 2025 -
How to save the tokenized data?
#7981 opened
May 7, 2025
27 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Distributed improvement of Muon implementation
#7808 commented on
May 13, 2025 • 1 new comment -
support repetition_penalty and align presence_penalty with OpenAI Client
#7958 commented on
May 7, 2025 • 0 new comments -
add Sequence Parallelism (reopened #6506)
#7338 commented on
May 8, 2025 • 0 new comments -
This change adds support for Intel Gaudi HPUs.
#7275 commented on
May 13, 2025 • 0 new comments -
使用vllm_infer一次性推理多个样本会报错
#7513 commented on
May 13, 2025 • 0 new comments -
🚨FAQs | 常见问题🚨
#4614 commented on
May 13, 2025 • 0 new comments -
使用huggingface加载Qwen2.5-32B-Instruct 回答乱码
#7371 commented on
May 13, 2025 • 0 new comments -
streaming 训练卡在第一个step
#7261 commented on
May 13, 2025 • 0 new comments -
有计划支持LoRAMoE吗?
#2749 commented on
May 13, 2025 • 0 new comments -
310P 微调报错 RuntimeError: call aclnnCast failed, detail:EZ9999: Inner Error
#5425 commented on
May 13, 2025 • 0 new comments -
Can't Finetune Qwen2.5-VL-3B-Instruct-AWQ Using LoRA and Multiple GPUs
#7963 commented on
May 12, 2025 • 0 new comments -
单机单卡4090,lora微调deepseek-r1-distill-1.5B卡死,稳定复现,会导致整个机器卡死,nvdia-smi命令失效,
#7661 commented on
May 12, 2025 • 0 new comments -
Support for images_root field for multimodal training data
#7814 commented on
May 12, 2025 • 0 new comments -
deepseek-moe-16B预训练问题
#7165 commented on
May 12, 2025 • 0 new comments -
Why Speed per iteration slower when dataset is large
#6410 commented on
May 12, 2025 • 0 new comments -
多模态数据集多的时候,数据加载失败
#7266 commented on
May 12, 2025 • 0 new comments -
使用neat_packing进行sft训练,模型性能指标下降明显
#5426 commented on
May 12, 2025 • 0 new comments -
ChatGLM3-6b在阿里云上使用lora微调模型后,训练完成后,点击chat加载模型出现以西报错
#5995 commented on
May 12, 2025 • 0 new comments -
RuntimeError: CUDA driver error: invalid argument
#7832 commented on
May 11, 2025 • 0 new comments -
simpo算法实现肯定存在bug,可以稳定复现
#7954 commented on
May 10, 2025 • 0 new comments -
LLava Series (7B, 14B) freeze_vision_tower=false bug
#6376 commented on
May 10, 2025 • 0 new comments -
Tuned Model's processor can't tokenize my text input properly.
#7608 commented on
May 10, 2025 • 0 new comments -
How to implement weight decay towards the pre-trained model?
#5603 commented on
May 9, 2025 • 0 new comments -
请问有使用bitsandbytes量化方法的环境配置分享吗
#7861 commented on
May 9, 2025 • 0 new comments -
Is it possible to support microsoft/bitnet-b1.58-2B-4T ?
#7775 commented on
May 8, 2025 • 0 new comments -
Size mismatch error in the middle of training
#6699 commented on
May 8, 2025 • 0 new comments -
昇腾多机训练 怎么设置
#5360 commented on
May 8, 2025 • 0 new comments