Description
System Info
I am trying to load a finetuned and quantized to 4bit Donut model. While save_pretrained works fine, when I try to load the quantized model (at quant_path) as
model = VisionEncoderDecoderModel.from_pretrained(quant_path, load_in_4bit = True), it loads all of the parameters correct except decoder.lm_head.weight, which is instead reset. I am unable to find the cause of this issue, and it happens both when 1. I load the quantized model, or 2. when I load the finetuned checkpoint with load_in_4bit argument.
I have tried the same steps with 'naver-clova-ix/donut-base' model from huggingface and it works fine. Any help would be much appreciated!
Reproduction
from transformers import VisionEncoderDecoderModel
finetuned_model.save_pretrained(finetuned_path, safe_serialization = False) #Safe_serialisation = True discards lm_head.weight
model = VisionEncoderDecoderModel.from_pretrained(finetuned_path, load_in_4bit=True)
Expected behavior
The model is loaded with the decoder.lm_head.weight from the finetuned checkpoint