Skip to content

No grad on visual LoRA layers #193

@tobiapoppi

Description

@tobiapoppi

Hi @2U1 , congrats for you great work and thanks for your huge effort actively maintaining this repo.

I have this problem and can't figure out what's happening:
I am finetuning Qwen-2.5-VL-7B with DPO on both image-video data, modifying the DPO training script in order to support also chosen-rej pairs on inputs rather than chosen-rej pairs on answers.
Btw:
If I finetune with VE frozen, LLM frozen, Merger trainable, everything looks fine.
If I finetune with VE frozen, LLM frozen, Merger trainable, lora_enabled (only on LLM), everything looks fine.
If I finetune with VE frozen, LLM frozen, Merger trainable, lora_enabled, vision_lora (on both VE and LLM): after the .backward() if I take self.model.base_model.model.model.language_model.layers[0].self_attn.q_proj.lora_B.default.weight.grad it is there, and correctly updating during steps. But if I take self.model.base_model.model.model.visual.blocks[0].attn.qkv.lora_B.default.weight.grad, result is None (not just Zero on values, it literally does not have grad). And this is true for every single LoRA param on the Vision encoder.

I debugged and optimizer correctly contains every lora param, also all loras has requires_grad = True.

Even if I disable lora on LLM and keep it only on VE (which is actually what I would like to do...) same problem! I don't understand what's happening here... I am debugging without deepspeed enabled.

GPUs: A100-SXM4-40GB
python: 3.11.12
torch: 2.8.0+cu128
torchvision: 0.23.0+cu128
accelerate: 1.10.1
peft: 0.15.2
transformers: 4.56.1

Thanks in advance :)
Great code!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions