-
Hi, INFO:hf-to-gguf:Loading model: lora_fused_model '+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|> ' }}{% endif %} would appreciate any help with this issue. Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Which model is this? I want to know if |
Beta Was this translation helpful? Give feedback.
Thanks. In this case, it seems like
model.embed_tokens.weight
is pre-quantized, since that tensor is in U32, and it's accompanied with scales and biases.The convert script does not yet support that, unfortunately. The
model.embed_tokens.weight
tensor would need to be dequantized first by applyingmodel.embed_tokens.scales
andmodel.embed_tokens.biases
to it (not sure how exactly).