Exposing the LoRA merge/export as a libary public function #8985

cyanic-selkie · 2024-08-11T13:03:09Z

cyanic-selkie
Aug 11, 2024

Recently, I've been playing with inference on mobile devices. One of the goals was to minimize the download size requirement, since that minimizes the friction for acquiring new users.

Naturally, small base model + quantization + multiple LoRA adapters (one for each task) was the to-go solution. However, this came at a cost to inference speed due to the LoRA overhead.

A good solution, I think, would be to merge the weights at first boot, thus trading storage for speed.

So, keeping with the spirit of "inference on the edge", I think it would be a good idea to expose the export/merge feature with the public API, not just as a binary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Exposing the LoRA merge/export as a libary public function #8985

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Exposing the LoRA merge/export as a libary public function #8985

Uh oh!

cyanic-selkie Aug 11, 2024

Replies: 0 comments

cyanic-selkie
Aug 11, 2024