Convert GGUF model to Hugging Face model #3770

fakerybakery · 2023-10-25T00:39:07Z

fakerybakery
Oct 25, 2023

Hi,
How can I convert a GGUF model back to a Hugging Face model? Specifically a model fine-tuned using llama.cpp?
Thanks in advance!

Answered by Vaibhavs10

Jun 7, 2024

As of today you should be able to not just load & run inference but also convert a Mistral or LLama checkpoint over to the Transformers format from the latest Transformers release!

Linking the relevant docs here: Transformers docs

Transformers supports conversion from all the major quantisation formats:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

To further save persist the model checkpoint, you can:

tokenizer.save_pretrained('…

View full answer

illumionous · 2023-10-25T03:17:57Z

illumionous
Oct 25, 2023

I TRIED，I transfered it to dequantized model ,and it became many bins with specialname ,and I tried load it ,but the model seems very perform very awfal

0 replies

KerfuffleV2 · 2023-10-25T11:07:43Z

KerfuffleV2
Oct 25, 2023
Collaborator

I don't think there's really a good way to do that currently. Also, if the model is quantized then you're going to lose a lot of quality if you ever quantize it again.

You'd also basically have to undo the stuff convert.py (or whatever) does. For example, it will permute some tensors during conversion for certain model types at least. If you convert to HF format without reversing that, the result is going to have serious issues.

8 replies

KerfuffleV2 Oct 25, 2023
Collaborator

Yeah, I figured. :) The point I was hinting at is that there isn't really a "the author" for community projects like this. He also isn't likely to be able to provide an answer either because the other person didn't describe what they tried that wasn't working. (Personally I try to avoid @ mentioning maintainers of large projects except when it's something fairly important since they're likely already dealing with an overwhelming number of notifications.)

illumionous Oct 25, 2023

right !I 'll post my answer tomorrow and see if we can make it happem but from what I teat itis awlful hope someone can teach me thereis something wrong I started to save the model through convert ggml togguf.py and I will explain later

SeanZhang7 Jun 6, 2024

May I ask if there has any solution now? I have a quantized .gguf model and I need to fine-tuned it.

Vaibhavs10 Jun 7, 2024
Collaborator

Hi @SeanZhang7 - Yes! please feel free to refer to the solution in this comment: #3770 (comment)

SeanZhang7 Jun 11, 2024

Thank you so much!

illumionous · 2023-10-26T08:44:07Z

illumionous
Oct 26, 2023

when I need to transform a ggml model to gguf ,USE convert-llama-ggml-to-gguf.py and add save tensor when add tensors then I get manylayers.bin use these to have amap like these "layers.8.post_attention_layernorm.weight": {
"shape": [
4096,
1
],
"datatype": "0",
"path": "/home/patrick/llama.c/llama.cpp/tensor_info_binaries/layers.8.ffn_norm.weight.bin"
},you can load a model through this ,adn I use dequantize to dequantize model ,then when I load the model ,it act like this

0 replies

Vaibhavs10 · 2024-06-07T12:46:25Z

Vaibhavs10
Jun 7, 2024
Collaborator

As of today you should be able to not just load & run inference but also convert a Mistral or LLama checkpoint over to the Transformers format from the latest Transformers release!

Linking the relevant docs here: Transformers docs

Transformers supports conversion from all the major quantisation formats:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

To further save persist the model checkpoint, you can:

tokenizer.save_pretrained('directory')
model.save_pretrained('directory')

We're hoping to extend this functionality further based on whichever architectures are requested further by the community!

Please log your requests over on this Transformers issue

0 replies

Convert GGUF model to Hugging Face model #3770

Uh oh!

fakerybakery Oct 25, 2023

Replies: 4 comments · 8 replies

Uh oh!

illumionous Oct 25, 2023

Uh oh!

KerfuffleV2 Oct 25, 2023 Collaborator

Uh oh!

KerfuffleV2 Oct 25, 2023 Collaborator

Uh oh!

illumionous Oct 25, 2023

Uh oh!

SeanZhang7 Jun 6, 2024

Uh oh!

Vaibhavs10 Jun 7, 2024 Collaborator

Uh oh!

SeanZhang7 Jun 11, 2024

Uh oh!

illumionous Oct 26, 2023

Uh oh!

Uh oh!

Vaibhavs10 Jun 7, 2024 Collaborator

fakerybakery
Oct 25, 2023

Replies: 4 comments 8 replies

illumionous
Oct 25, 2023

KerfuffleV2
Oct 25, 2023
Collaborator

KerfuffleV2 Oct 25, 2023
Collaborator

Vaibhavs10 Jun 7, 2024
Collaborator

illumionous
Oct 26, 2023

Vaibhavs10
Jun 7, 2024
Collaborator