Replies: 1 comment 4 replies
-
Hey, I am running similar experiments and have the following observations:
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I’m finetunning LLM on my data using SFTTrainer, bitsandbytes quatization and peft with configs like listed below. When I convert the model to GGUF for CPU inference, the model performance significantly drops. Any idea what could be a problem?
I do conversion to gguf in the following way. First, merge trained adapter with base model. Then such merged model is converted to gguf using llama.cpp, ‘convert.py’ script, I do q8_0 quantization, tested other types without success. I tested as well conversion using unsloath, as well w/o positive result.
python convert.py <MERGED_MODEL_PATH>
--outfile <OUTPUT_MODEL_NAME.gguf>
--outtype q8_0
--vocab_dir <ADAPTER_MODEL_PATH>
Beta Was this translation helpful? Give feedback.
All reactions