Skip to content

Bug: Loading DeepSeek R1T Chimera causes "llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1, 1" #383

@Alexey-Akishin

Description

@Alexey-Akishin

What happened?

I tried loading https://huggingface.co/bullerwins/DeepSeek-R1T-Chimera-GGUF/tree/main/DeepSeek-R1T-Chimera-Q4_K_M and get this error (the same model loads fine with llama.cpp):

llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q_b.weight' has wrong shape; expected  1536, 73728, got  1536, 24576,     1,     1

Original model: https://huggingface.co/tngtech/DeepSeek-R1T-Chimera

It is a merge of DeepSeek-R1 and DeepSeek-V3 (0324). It is quite well made too, bringing together good qualities of both models, not sure though why it fails in ik_llama.cpp. At first I tried to run repacked model with llama-quantize, but then I also tried to run the original quant, I also tried with or without -rtr and CPU-only without any cache quantization and without flash attention (just specifying ctx-size and model to load), with the same outcome unfortunately.

Name and Version

version: 3667 (e3fec17)

What operating system are you seeing the problem on?

No response

Relevant log output

llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = deepseek2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1T Chimera Bf16
llama_model_loader: - kv   3:                         general.size_label str              = 256x20B
llama_model_loader: - kv   4:                            general.license str              = mit
llama_model_loader: - kv   5:                   general.base_model.count u32              = 2
llama_model_loader: - kv   6:                  general.base_model.0.name str              = DeepSeek V3 0324
llama_model_loader: - kv   7:               general.base_model.0.version str              = V3-0324
llama_model_loader: - kv   8:          general.base_model.0.organization str              = Deepseek Ai
llama_model_loader: - kv   9:              general.base_model.0.repo_url str              = https://huggingface.co/deepseek-ai/De...
llama_model_loader: - kv  10:                  general.base_model.1.name str              = DeepSeek R1
llama_model_loader: - kv  11:          general.base_model.1.organization str              = Deepseek Ai
llama_model_loader: - kv  12:              general.base_model.1.repo_url str              = https://huggingface.co/deepseek-ai/De...
llama_model_loader: - kv  13:                               general.tags arr[str,1]       = ["text-generation"]
llama_model_loader: - kv  14:                      deepseek2.block_count u32              = 61
llama_model_loader: - kv  15:                   deepseek2.context_length u32              = 163840
llama_model_loader: - kv  16:                 deepseek2.embedding_length u32              = 7168
llama_model_loader: - kv  17:              deepseek2.feed_forward_length u32              = 18432
llama_model_loader: - kv  18:             deepseek2.attention.head_count u32              = 128
llama_model_loader: - kv  19:          deepseek2.attention.head_count_kv u32              = 1
llama_model_loader: - kv  20:                   deepseek2.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  21: deepseek2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                deepseek2.expert_used_count u32              = 8
llama_model_loader: - kv  23:        deepseek2.leading_dense_block_count u32              = 3
llama_model_loader: - kv  24:                       deepseek2.vocab_size u32              = 129280
llama_model_loader: - kv  25:            deepseek2.attention.q_lora_rank u32              = 1536
llama_model_loader: - kv  26:           deepseek2.attention.kv_lora_rank u32              = 512
llama_model_loader: - kv  27:             deepseek2.attention.key_length u32              = 576
llama_model_loader: - kv  28:           deepseek2.attention.value_length u32              = 512
llama_model_loader: - kv  29:         deepseek2.attention.key_length_mla u32              = 192
llama_model_loader: - kv  30:       deepseek2.attention.value_length_mla u32              = 128
llama_model_loader: - kv  31:       deepseek2.expert_feed_forward_length u32              = 2048
llama_model_loader: - kv  32:                     deepseek2.expert_count u32              = 256
llama_model_loader: - kv  33:              deepseek2.expert_shared_count u32              = 1
llama_model_loader: - kv  34:             deepseek2.expert_weights_scale f32              = 2.500000
llama_model_loader: - kv  35:              deepseek2.expert_weights_norm bool             = true
llama_model_loader: - kv  36:               deepseek2.expert_gating_func u32              = 2
llama_model_loader: - kv  37:             deepseek2.rope.dimension_count u32              = 64
llama_model_loader: - kv  38:                deepseek2.rope.scaling.type str              = yarn
llama_model_loader: - kv  39:              deepseek2.rope.scaling.factor f32              = 40.000000
llama_model_loader: - kv  40: deepseek2.rope.scaling.original_context_length u32              = 4096
llama_model_loader: - kv  41: deepseek2.rope.scaling.yarn_log_multiplier f32              = 0.100000
llama_model_loader: - kv  42:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  43:                         tokenizer.ggml.pre str              = deepseek-v3
llama_model_loader: - kv  44:                      tokenizer.ggml.tokens arr[str,129280]  = ["<|begin▁of▁sentence|>", "<�...
llama_model_loader: - kv  45:                  tokenizer.ggml.token_type arr[i32,129280]  = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  46:                      tokenizer.ggml.merges arr[str,127741]  = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
llama_model_loader: - kv  47:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  48:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  49:            tokenizer.ggml.padding_token_id u32              = 1
llama_model_loader: - kv  50:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  51:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  52:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
llama_model_loader: - kv  53:               general.quantization_version u32              = 2
llama_model_loader: - kv  54:                          general.file_type u32              = 214
llama_model_loader: - type  f32:  361 tensors
llama_model_loader: - type q5_0:   61 tensors
llama_model_loader: - type q4_K:  467 tensors
llama_model_loader: - type q6_K:   31 tensors
llama_model_loader: - type q4_k_r4:  139 tensors
llama_model_loader: - type q6_k_r4:   27 tensors
llm_load_vocab: special tokens cache size = 818
llm_load_vocab: token to piece cache size = 0.8223 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = deepseek2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 129280
llm_load_print_meta: n_merges         = 127741
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 163840
llm_load_print_meta: n_embd           = 7168
llm_load_print_meta: n_layer          = 61
llm_load_print_meta: n_head           = 128
llm_load_print_meta: n_head_kv        = 1
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_swa_pattern    = 1
llm_load_print_meta: n_embd_head_k    = 576
llm_load_print_meta: n_embd_head_v    = 512
llm_load_print_meta: n_gqa            = 128
llm_load_print_meta: n_embd_k_gqa     = 576
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 18432
llm_load_print_meta: n_expert         = 256
llm_load_print_meta: n_expert_used    = 8
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = yarn
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 0.025
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 671B
llm_load_print_meta: model ftype      = Q4_K_R4
llm_load_print_meta: model params     = 671.026 B
llm_load_print_meta: model size       = 376.710 GiB (4.822 BPW) 
llm_load_print_meta: repeating layers = 375.516 GiB (4.820 BPW, 669.173 B parameters)
llm_load_print_meta: general.name     = DeepSeek R1T Chimera Bf16
llm_load_print_meta: BOS token        = 0 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token        = 1 '<|end▁of▁sentence|>'
llm_load_print_meta: PAD token        = 1 '<|end▁of▁sentence|>'
llm_load_print_meta: LF token         = 131 'Ä'
llm_load_print_meta: max token length = 256
llm_load_print_meta: n_layer_dense_lead   = 3
llm_load_print_meta: n_lora_q             = 1536
llm_load_print_meta: n_lora_kv            = 512
llm_load_print_meta: n_ff_exp             = 2048
llm_load_print_meta: n_expert_shared      = 1
llm_load_print_meta: expert_weights_scale = 2.5
llm_load_print_meta: expert_weights_norm  = 1
llm_load_print_meta: expert_gating_func   = sigmoid
llm_load_print_meta: rope_yarn_log_mul    = 0.1000
llm_load_tensors: ggml ctx size =    2.23 MiB
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q_b.weight' has wrong shape; expected  1536, 73728, got  1536, 24576,     1,     1
llama_load_model_from_file: failed to load model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions