Summary
An attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper _try_copy
in llama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece()
casts a very large size_t
token length into an int32_t
, causing the length check (if (length < (int32_t)size)
) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution.
Details
The vulnerability lies in the function:
llama.cpp/src/vocab.cpp
llama_vocab::impl::token_to_piece(llama_token token,
char * buf,
int32_t length,
int32_t lstrip,
bool special) const
Specifically, the inline helper _try_copy
performs a signed comparison against a potentially oversized size_t
without handling cases where size_t
exceeds INT32_MAX
. When that happens, the cast to int32_t
wraps into a negative value, causing the length check to be bypassed and leading to an unchecked memcpy.
// File: llama.cpp/src/vocab.cpp (around line 2570)
auto _try_copy = [=](const char * token, size_t size) -> int32_t {
// 1) Skip up to `lstrip` leading spaces in the token string.
for (int32_t i = 0; i < lstrip && size && *token == ' '; ++i) {
token++;
size--;
}
// 2) Bound check (VULNERABLE):
// - `length` is the maximum number of bytes the caller promised `buf` can hold (signed int32_t).
// - `size` is the unsigned token length (size_t). If size > INT32_MAX, casting to int32_t overflows
// and produces a negative value.
if (length < (int32_t) size) {
// Intention: return a negative error code when the token is too large to fit.
// But when size > INT32_MAX:
// (int32_t)size becomes a negative integer (e.g. size_t=2,147,483,648 → (int32_t)=−2,147,483,648).
// Then (length < negative) is always false, so this branch is skipped.
return -(int32_t) size;
}
// 3) Unchecked memcpy (VULNERABLE):
// At this point, even if `size` is far larger than `length`, the code will reach this memcpy,
// because the prior check falsely evaluated to false when (int32_t)size wrapped negative.
// This copies `size` bytes into `buf`, overrunning the buffer whenever size > length.
memcpy(buf, token, size);
// 4) Return the number of bytes copied (signed).
// Note: this cast also overflows if size > INT32_MAX, but the overflow has already happened.
return (int32_t) size;
};
Why This Check Fails for Extremely Large Tokens:
- Unsigned size vs. Signed length:
- size is
size_t
(e.g., 64-bit on most platforms).
- length is
int32_t
(maximum positive value = 2,147,483,647
).
- Cast Overflow:
- If
token_text.size() > INT32_MAX
, then (int32_t
) size wraps into a negative value (two’s-complement). For example:
size_t size = 2,147,483,648 // one more than INT32_MAX
(int32_t)size → −2,147,483,648
- The comparison if (length < (int32_t) size) becomes effectively if (small_positive < large_negative), which is always false.
- Unchecked
memcpy
- Because the bound check is bypassed, the code executes memcpy(buf, token, size).
- Even though buf only has room for length bytes,
memcpy
uses the full (very large) size, causing a buffer overflow to the tune of billions of bytes.
Callers and Code Paths
Any “token → string” conversion can overflow if token_text.size() > INT32_MAX
. Notable call sites include:
- Model loading (each GGUF token string passes through
token_to_piece()
)
- Detokenization (
llama_vocab::impl::detokenize(...)
)
- Grammar routines (
llama_grammar_apply_impl
, llama_grammar_accept_impl
)
- Sampling & infill (
llama_sampler_infill_apply
, etc.)
- Public API (
llama_token_to_piece(...)
)
As soon as llama.cpp loads the oversized token, it will crash with a buffer‐overflow in _try_copy().

Impact
Summary
An attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper
_try_copy
inllama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece()
casts a very largesize_t
token length into anint32_t
, causing the length check (if (length < (int32_t)size)
) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution.Details
The vulnerability lies in the function:
Specifically, the inline helper
_try_copy
performs a signed comparison against a potentially oversizedsize_t
without handling cases wheresize_t
exceedsINT32_MAX
. When that happens, the cast toint32_t
wraps into a negative value, causing the length check to be bypassed and leading to an unchecked memcpy.Why This Check Fails for Extremely Large Tokens:
size_t
(e.g., 64-bit on most platforms).int32_t
(maximum positive value =2,147,483,647
).token_text.size() > INT32_MAX
, then (int32_t
) size wraps into a negative value (two’s-complement). For example:memcpy
memcpy
uses the full (very large) size, causing a buffer overflow to the tune of billions of bytes.Callers and Code Paths
Any “token → string” conversion can overflow if
token_text.size() > INT32_MAX
. Notable call sites include:token_to_piece()
)llama_vocab::impl::detokenize(...)
)llama_grammar_apply_impl
,llama_grammar_accept_impl
)llama_sampler_infill_apply
, etc.)llama_token_to_piece(...)
)As soon as llama.cpp loads the oversized token, it will crash with a buffer‐overflow in _try_copy().

Impact
Vulnerability Type
_try_copy()
.Attack Vector
token_text.size()
exceedsINT32_MAX
.size_t
bypasses the length check and triggers an uncheckedmemcpy
.Affected Component
llama_vocab::impl::token_to_piece()
, which is invoked by:llama_grammar_apply_impl
,llama_grammar_accept_impl
)llama_sampler_infill_apply
, etc.)llama_token_to_piece()
)Severity
Consequences
Who Is Impacted
Mitigation & Recommendations
_try_copy
so thatlength
andsize
are compared in an unsigned context, for example:size
values aboveINT32_MAX
cannot bypass the bound check.