llama.cpp and lora - forcing 16 bit or a merge is kind of defeating the purpose of a separate lora file ? #4317

cmp-nct · 2023-12-04T03:02:20Z

cmp-nct
Dec 4, 2023

" the simultaneous use of LoRAs and GPU acceleration is only supported for f16 models"
Given LORA already forces mmap to be disabled, we have full access on the memory.

Why not during load: dequantize any LORA layer to FP16 -> apply lora -> quantize again
For best quality it would be possible to point to a FP16 model to load the raw layer.

Like a on-the-fly combination, so the GPU kernels will not even know it was a LORA.

Having to merge LORAs kind of defeats the purpose of it, that's just a preprocessed finetune ?:)

samikrc · 2023-12-15T06:59:13Z

samikrc
Dec 15, 2023

Got hit by the same issue as well. I ran a qlora finetuning of OpenHermes 2.5 (mistral) using axolotl, converted the adapter file to ggml using convert-lora-to-ggml.py, and then tried loading the model through llama.cpp server like so:

llama.cpp$ ./build/bin/server -m ./models/7B/openhermes-2.5-mistral-7b.Q4_K_M.gguf --lora ./models/7B/qlora-out/ggml-adapter-model.bin -t 10 -a "openhermes-2.5" -c 4352 -ngl 99 -n 512 -np 2 --host 0.0.0.0 --port 8585

Got the error:

llama_model_apply_lora_from_file: failed to apply lora adapter: llama_apply_lora_from_file_internal: error: the simultaneous use of LoRAs and GPU acceleration is only supported for f16 models. dest_t->type: 12
llama_init_from_gpt_params: error: failed to apply lora adapter

Do I need to change the convert-lora-to-ggml.py to write out in f16?

0 replies

samikrc · 2023-12-15T13:11:18Z

samikrc
Dec 15, 2023

Hmm ... this seem to have got resolved by downloading the latest of llama.cpp, compiling and using it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama.cpp and lora - forcing 16 bit or a merge is kind of defeating the purpose of a separate lora file ? #4317

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

llama.cpp and lora - forcing 16 bit or a merge is kind of defeating the purpose of a separate lora file ? #4317

Uh oh!

Uh oh!

cmp-nct Dec 4, 2023

Replies: 2 comments

Uh oh!

samikrc Dec 15, 2023

Uh oh!

samikrc Dec 15, 2023

cmp-nct
Dec 4, 2023

samikrc
Dec 15, 2023

samikrc
Dec 15, 2023