Using both Qwen Instruct + Qwen infill via LoRA #10822

ngxson · 2024-12-13T22:57:06Z

ngxson
Dec 13, 2024
Collaborator

The author of Qwen model confirm that infill capability is only possible with Qwen-coder (non-Instruct): https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/discussions/2#6731a45e0e39be0605a0df20

This will limit the capability of the model to /infill only, so it cannot be used with /chat/completions

However, we know that the instruct version is indeed fine-tuned from non-instruct, see the technical report: https://arxiv.org/pdf/2409.12186

To make the model usable with both chat and infill, one solution is to extract the difference between 2 models to a LoRA adapter. This can be done via something like mergekit-extract-lora, then we can set lora scale at runtime (i.e. set to 0.0 on infill and 1.0 on chat)

ggerganov · 2024-12-14T06:41:12Z

ggerganov
Dec 14, 2024
Maintainer

Good idea! Btw, shouldn't we implement a LoRA extractor in llama.cpp?

7 replies

ggerganov Dec 14, 2024
Maintainer

The power method in pt. 3 of this paper https://watermark.silverchair.com/300268.pdf seems quite simple to implement with ggml graphs. What do you think?

ngxson Dec 14, 2024
Collaborator Author

I can't access the given link, could you give me the name of the paper? (or a screenshot)

ggerganov Dec 14, 2024
Maintainer

Should be accessible from here: https://academic.oup.com/comjnl/article/30/3/268/364791 (click on PDF button)

ngxson Dec 14, 2024
Collaborator Author

Hmm yeah that sounds feasible. I also found an implementation using numpy here: https://gist.github.com/Zhenye-Na/cbf4e534b44ef94fdbad663ef56dd333

ngxson Mar 17, 2025
Collaborator Author

@ggerganov I have a GGML implementation of SVD that work: https://github.com/ngxson/ggml-easy/blob/master/demo/svd.cpp

CC @compilade too, we discussed this quite long ago

ngxson · 2025-01-07T21:34:50Z

ngxson
Jan 7, 2025
Collaborator Author

Small update on this, I've been able to convert the diff between 2 models into a LoRA adapter: https://huggingface.co/ngxson/LoRA-Qwen2.5-Coder-7B-Instruct

I haven't tested with infill, will try in a few days. But in the mean time, we also need #11131 to be merged, so lora for token embeddings will be supported

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using both Qwen Instruct + Qwen infill via LoRA #10822

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using both Qwen Instruct + Qwen infill via LoRA #10822

Uh oh!

ngxson Dec 13, 2024 Collaborator

Replies: 2 comments · 7 replies

Uh oh!

ggerganov Dec 14, 2024 Maintainer

Uh oh!

ggerganov Dec 14, 2024 Maintainer

Uh oh!

ngxson Dec 14, 2024 Collaborator Author

Uh oh!

ggerganov Dec 14, 2024 Maintainer

Uh oh!

ngxson Dec 14, 2024 Collaborator Author

Uh oh!

ngxson Mar 17, 2025 Collaborator Author

Uh oh!

ngxson Jan 7, 2025 Collaborator Author

ngxson
Dec 13, 2024
Collaborator

Replies: 2 comments 7 replies

ggerganov
Dec 14, 2024
Maintainer

ggerganov Dec 14, 2024
Maintainer

ngxson Dec 14, 2024
Collaborator Author

ggerganov Dec 14, 2024
Maintainer

ngxson Dec 14, 2024
Collaborator Author

ngxson Mar 17, 2025
Collaborator Author

ngxson
Jan 7, 2025
Collaborator Author