Skip to content

Bug: failed to load model 'gemma-3n-E4B-it-Q4_K_M.gguf' #772

@lykamspam

Description

@lykamspam

Contact Details

lm-studio download gguf

What happened?

[...]
llama_model_loader: - type f32: 422 tensors
llama_model_loader: - type f16: 108 tensors
llama_model_loader: - type q4_K: 282 tensors
llama_model_loader: - type q6_K: 35 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'gemma-3n-E4B-it-Q4_K_M.gguf'
main: error: unable to load model

Version

llamafile --version
llamafile v0.9.3

What operating system are you seeing the problem on?

Linux

Relevant log output

Log start
Cmd: ./llava.llamafile -m gemma-3n-E4B-it-Q4_K_M.gguf -p "Pisz do mnie po polsku. Używam metrów i systemu metrycznego." --host 0.0.0.0 -ngl 9999
main: build = 1500 (a30b324)
main: built with cosmocc (GCC) 11.2.0 for x86_64-linux-cosmo
main: seed  = 1751038437
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 39 key-value pairs and 847 tensors from gemma-3n-E4B-it-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma3n
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Gg Hf Gm_Gemma 3n E4B It
llama_model_loader: - kv   3:                           general.finetune str              = 3n-E4B-it
llama_model_loader: - kv   4:                           general.basename str              = gg-hf-gm_gemma
llama_model_loader: - kv   5:                         general.size_label str              = 6.9B
llama_model_loader: - kv   6:                     gemma3n.context_length u32              = 32768
llama_model_loader: - kv   7:                   gemma3n.embedding_length u32              = 2048
llama_model_loader: - kv   8:                        gemma3n.block_count u32              = 35
llama_model_loader: - kv   9:                gemma3n.feed_forward_length u32              = 16384
llama_model_loader: - kv  10:               gemma3n.attention.head_count u32              = 8
llama_model_loader: - kv  11:   gemma3n.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  12:               gemma3n.attention.key_length u32              = 256
llama_model_loader: - kv  13:             gemma3n.attention.value_length u32              = 256
llama_model_loader: - kv  14:                     gemma3n.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  15:           gemma3n.attention.sliding_window u32              = 512
llama_model_loader: - kv  16:            gemma3n.attention.head_count_kv u32              = 2
llama_model_loader: - kv  17:                   gemma3n.altup.active_idx u32              = 0
llama_model_loader: - kv  18:                   gemma3n.altup.num_inputs u32              = 4
llama_model_loader: - kv  19:   gemma3n.embedding_length_per_layer_input u32              = 256
llama_model_loader: - kv  20:         gemma3n.attention.shared_kv_layers f32              = 15.000000
llama_model_loader: - kv  21:          gemma3n.activation_sparsity_scale arr[f32,35]      = [1.644853, 1.644853, 1.644853, 1.6448...
llama_model_loader: - kv  22:   gemma3n.attention.sliding_window_pattern arr[bool,35]     = [true, true, true, true, false, true,...
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  27:                      tokenizer.ggml.scores arr[f32,262144]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  28:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  29:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  30:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  31:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  33:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  34:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  35:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  36:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  37:               general.quantization_version u32              = 2
llama_model_loader: - kv  38:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  422 tensors
llama_model_loader: - type  f16:  108 tensors
llama_model_loader: - type q4_K:  282 tensors
llama_model_loader: - type q6_K:   35 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
llama_load_model_from_file: failed to load model
main: error: unable to load model

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions