Question about layer decision of LoRA

Hi, thank you for your great work and detailed explanation!
I’m currently working on a project based on your work and have a few questions.

I noticed a similar discussion in issue #49, but I’d like to follow up with a more specific question.
I’m curious about your decision to apply LoRA only on attn1 in each transformer block and on the convolutional layers, instead of applying it to all attention layers as typically done in the PEFT configuration.

Was this choice based on qualitative observations or some other considerations we might be overlooking?

Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about layer decision of LoRA #131

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about layer decision of LoRA #131

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions