Heap-based Buffer Over-read in llama_model_load

Summary

A heap-based buffer over-read vulnerability in the GGUF file parser allows a remote attacker to cause a Denial of Service (DoS) by providing a maliciously crafted model file. The vulnerability is triggered when the application attempts to load a model that has a vocabulary size smaller than the hardcoded default ID for special tokens (e.g., Begin of Sentence), leading to an out-of-bounds read and crash.

Details

In llama-cli, the print_info function is called after loading the model. This function attempts to access the text for special_bos_id without validating if the ID is within the bounds of the vocabulary vector.

void llama_vocab::impl::print_info() const {
    LLAMA_LOG_INFO("%s: vocab type       = %s\n",     __func__, type_name().c_str());
    LLAMA_LOG_INFO("%s: n_vocab          = %u\n",     __func__, vocab.n_tokens());
    LLAMA_LOG_INFO("%s: n_merges         = %u\n",     __func__, (uint32_t) bpe_ranks.size());

    // special tokens
    if (special_bos_id  != LLAMA_TOKEN_NULL)    { LLAMA_LOG_INFO( "%s: BOS token        = %d '%s'\n", __func__, special_bos_id,     id_to_token[special_bos_id].text.c_str() );  }

The special_bos_id (Begin of Sentence ID) is initialized with a default value of 1.

With special_bos_id being 1 and id_to_token.size() being 1, the access id_to_token[1] is an out-of-bounds read on the heap, which causes a segmentation fault.

PoC

https://drive.google.com/file/d/17utcgl2AmEVpVgOShqe6gEadUxV7-_bQ/view?usp=drive_link

./build/bin/llama-cli -m vocab_overflow.gguf

output：

register_device: registered device CPU (Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz)
load_backend: failed to find ggml_backend_init in /home/llama.cpp/build/bin/libggml-cpu.so
build: 5618 (1f63e75f) with cc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 for x86_64-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 10 key-value pairs and 1 tensors from examples/gguf/malicious.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                          llama.block_count u32              = 2
llama_model_loader: - kv   2:                       llama.context_length u32              = 1024
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   5:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   6:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   7:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv   8:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv   9:                      tokenizer.ggml.tokens arr[str,1]       = ["<unk>"]
llama_model_loader: - type  f32:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = all F32 (guessed)
print_info: file size   = 0.00 MiB (0.00 BPW) 
load: SPM vocabulary, but newline token not found: _Map_base::at! Using special_pad_id instead.load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 0
load: token to piece cache size = 0.0000 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: n_ctx_train      = 1024
print_info: n_embd           = 4096
print_info: n_layer          = 2
print_info: n_head           = 32
print_info: n_head_kv        = 32
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 4096
print_info: n_embd_v_gqa     = 4096
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 11008
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 1024
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = ?B
print_info: model params     = 4611686.02 T
print_info: general.name     = n/a
print_info: vocab type       = SPM
print_info: n_vocab          = 1
print_info: n_merges         = 0
=================================================================
==734216==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604000050e78 at pc 0x7fba119d8db6 bp 0x7fffa3d98010 sp 0x7fffa3d98000
READ of size 8 at 0x604000050e78 thread T0
    #0 0x7fba119d8db5 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data() const /usr/include/c++/9/bits/basic_string.h:187
    #1 0x7fba119d1a90 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::c_str() const /usr/include/c++/9/bits/basic_string.h:2301
    #2 0x7fba11f8c110 in llama_vocab::impl::print_info() const /home/llama.cpp/src/llama-vocab.cpp:2771
    #3 0x7fba11f8ddef in llama_vocab::print_info() const /home/llama.cpp/src/llama-vocab.cpp:3112
    #4 0x7fba11efa4b4 in llama_model::print_info() const /home/llama.cpp/src/llama-model.cpp:4442
    #5 0x7fba11dd9a53 in llama_model_load /home/llama.cpp/src/llama.cpp:119
    #6 0x7fba11dda5bc in llama_model_load_from_file_impl /home/llama.cpp/src/llama.cpp:217
    #7 0x7fba11dda7d7 in llama_model_load_from_file /home/llama.cpp/src/llama.cpp:244
    #8 0x55849daa6d17 in common_init_from_params(common_params&) /home/llama.cpp/common/common.cpp:892
    #9 0x55849d933d0a in main /home/llama.cpp/tools/main/main.cpp:140
    #10 0x7fba0f45c082 in __libc_start_main ../csu/libc-start.c:308
    #11 0x55849d9331bd in _start (/home/llama.cpp/build/bin/llama-cli+0x801bd)

0x604000050e78 is located 0 bytes to the right of 40-byte region [0x604000050e50,0x604000050e78)
allocated by thread T0 here:
    #0 0x7fba12203947 in operator new(unsigned long) (/lib/x86_64-linux-gnu/libasan.so.5+0x10f947)
    #1 0x7fba11fa6153 in __gnu_cxx::new_allocator<llama_vocab::token_data>::allocate(unsigned long, void const*) /usr/include/c++/9/ext/new_allocator.h:114
    #2 0x7fba11fa4156 in std::allocator_traits<std::allocator<llama_vocab::token_data> >::allocate(std::allocator<llama_vocab::token_data>&, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:444
    #3 0x7fba11fa0e97 in std::_Vector_base<llama_vocab::token_data, std::allocator<llama_vocab::token_data> >::_M_allocate(unsigned long) /usr/include/c++/9/bits/stl_vector.h:343
    #4 0x7fba11f9bce8 in std::vector<llama_vocab::token_data, std::allocator<llama_vocab::token_data> >::_M_default_append(unsigned long) /usr/include/c++/9/bits/vector.tcc:635
    #5 0x7fba11f976a0 in std::vector<llama_vocab::token_data, std::allocator<llama_vocab::token_data> >::resize(unsigned long) /usr/include/c++/9/bits/stl_vector.h:937
    #6 0x7fba11f8624e in llama_vocab::impl::load(llama_model_loader&, LLM_KV const&) /home/llama.cpp/src/llama-vocab.cpp:1704
    #7 0x7fba11f8c84d in llama_vocab::load(llama_model_loader&, LLM_KV const&) /home/llama.cpp/src/llama-vocab.cpp:2803
    #8 0x7fba11ecea42 in llama_model::load_vocab(llama_model_loader&) /home/llama.cpp/src/llama-model.cpp:1464
    #9 0x7fba11dd9a2b in llama_model_load /home/llama.cpp/src/llama.cpp:113
    #10 0x7fba11dda5bc in llama_model_load_from_file_impl /home/llama.cpp/src/llama.cpp:217
    #11 0x7fba11dda7d7 in llama_model_load_from_file /home/llama.cpp/src/llama.cpp:244
    #12 0x55849daa6d17 in common_init_from_params(common_params&) /home/llama.cpp/common/common.cpp:892
    #13 0x55849d933d0a in main /home/llama.cpp/tools/main/main.cpp:140
    #14 0x7fba0f45c082 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/include/c++/9/bits/basic_string.h:187 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data() const
Shadow bytes around the buggy address:
  0x0c0880002170: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 07 fa
  0x0c0880002180: fa fa 00 00 00 00 00 fa fa fa fd fd fd fd fd fa
  0x0c0880002190: fa fa 00 00 00 00 07 fa fa fa fd fd fd fd fd fd
  0x0c08800021a0: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
  0x0c08800021b0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
=>0x0c08800021c0: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00[fa]
  0x0c08800021d0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c08800021e0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c08800021f0: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
  0x0c0880002200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0880002210: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==734216==ABORTING

Impact

Heap overflow read, can lead to Segmentation fault and cause llama.cpp to crash (DoS)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heap-based Buffer Over-read in llama_model_load

Package

Affected versions

Patched versions

Description

Summary

Details

PoC

Impact

Severity

CVSS overall score

CVSS v3 base metrics

CVSS v3 base metrics

CVE ID

Weaknesses

Out-of-bounds Read

Credits