Exposing the LoRA merge/export as a libary public function #8985
cyanic-selkie
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Recently, I've been playing with inference on mobile devices. One of the goals was to minimize the download size requirement, since that minimizes the friction for acquiring new users.
Naturally, small base model + quantization + multiple LoRA adapters (one for each task) was the to-go solution. However, this came at a cost to inference speed due to the LoRA overhead.
A good solution, I think, would be to merge the weights at first boot, thus trading storage for speed.
So, keeping with the spirit of "inference on the edge", I think it would be a good idea to expose the export/merge feature with the public API, not just as a binary.
Beta Was this translation helpful? Give feedback.
All reactions