-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
Description
Contact Details
lm-studio download gguf
What happened?
[...]
llama_model_loader: - type f32: 422 tensors
llama_model_loader: - type f16: 108 tensors
llama_model_loader: - type q4_K: 282 tensors
llama_model_loader: - type q6_K: 35 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'gemma-3n-E4B-it-Q4_K_M.gguf'
main: error: unable to load model
Version
llamafile --version
llamafile v0.9.3
What operating system are you seeing the problem on?
Linux
Relevant log output
Log start
Cmd: ./llava.llamafile -m gemma-3n-E4B-it-Q4_K_M.gguf -p "Pisz do mnie po polsku. Używam metrów i systemu metrycznego." --host 0.0.0.0 -ngl 9999
main: build = 1500 (a30b324)
main: built with cosmocc (GCC) 11.2.0 for x86_64-linux-cosmo
main: seed = 1751038437
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 39 key-value pairs and 847 tensors from gemma-3n-E4B-it-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = gemma3n
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Gg Hf Gm_Gemma 3n E4B It
llama_model_loader: - kv 3: general.finetune str = 3n-E4B-it
llama_model_loader: - kv 4: general.basename str = gg-hf-gm_gemma
llama_model_loader: - kv 5: general.size_label str = 6.9B
llama_model_loader: - kv 6: gemma3n.context_length u32 = 32768
llama_model_loader: - kv 7: gemma3n.embedding_length u32 = 2048
llama_model_loader: - kv 8: gemma3n.block_count u32 = 35
llama_model_loader: - kv 9: gemma3n.feed_forward_length u32 = 16384
llama_model_loader: - kv 10: gemma3n.attention.head_count u32 = 8
llama_model_loader: - kv 11: gemma3n.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 12: gemma3n.attention.key_length u32 = 256
llama_model_loader: - kv 13: gemma3n.attention.value_length u32 = 256
llama_model_loader: - kv 14: gemma3n.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 15: gemma3n.attention.sliding_window u32 = 512
llama_model_loader: - kv 16: gemma3n.attention.head_count_kv u32 = 2
llama_model_loader: - kv 17: gemma3n.altup.active_idx u32 = 0
llama_model_loader: - kv 18: gemma3n.altup.num_inputs u32 = 4
llama_model_loader: - kv 19: gemma3n.embedding_length_per_layer_input u32 = 256
llama_model_loader: - kv 20: gemma3n.attention.shared_kv_layers f32 = 15.000000
llama_model_loader: - kv 21: gemma3n.activation_sparsity_scale arr[f32,35] = [1.644853, 1.644853, 1.644853, 1.6448...
llama_model_loader: - kv 22: gemma3n.attention.sliding_window_pattern arr[bool,35] = [true, true, true, true, false, true,...
llama_model_loader: - kv 23: tokenizer.chat_template str = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv 24: tokenizer.ggml.model str = llama
llama_model_loader: - kv 25: tokenizer.ggml.pre str = default
llama_model_loader: - kv 26: tokenizer.ggml.tokens arr[str,262144] = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv 27: tokenizer.ggml.scores arr[f32,262144] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 28: tokenizer.ggml.token_type arr[i32,262144] = [3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 29: tokenizer.ggml.bos_token_id u32 = 2
llama_model_loader: - kv 30: tokenizer.ggml.eos_token_id u32 = 1
llama_model_loader: - kv 31: tokenizer.ggml.unknown_token_id u32 = 3
llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 33: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 34: tokenizer.ggml.add_sep_token bool = false
llama_model_loader: - kv 35: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 36: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 37: general.quantization_version u32 = 2
llama_model_loader: - kv 38: general.file_type u32 = 15
llama_model_loader: - type f32: 422 tensors
llama_model_loader: - type f16: 108 tensors
llama_model_loader: - type q4_K: 282 tensors
llama_model_loader: - type q6_K: 35 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
llama_load_model_from_file: failed to load model
main: error: unable to load model
torchss