Skip to content

Buffer Overflow in llama.cpp via Malicious GGUF Model – Exploitable via Vocabulary Loading (`llama_vocab::impl::token_to_piece`)

High
ggerganov published GHSA-8wwf-w4qm-gpqr Jun 14, 2025

Package

llama.cpp

Affected versions

All versions before patch

Patched versions

>= b5662

Description

Summary

An attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper _try_copy in llama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece() casts a very large size_t token length into an int32_t, causing the length check (if (length < (int32_t)size)) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution.

Details

The vulnerability lies in the function:

llama.cpp/src/vocab.cpp
  llama_vocab::impl::token_to_piece(llama_token token,
                                    char * buf,
                                    int32_t length,
                                    int32_t lstrip,
                                    bool special) const

Specifically, the inline helper _try_copy performs a signed comparison against a potentially oversized size_t without handling cases where size_t exceeds INT32_MAX. When that happens, the cast to int32_t wraps into a negative value, causing the length check to be bypassed and leading to an unchecked memcpy.

// File: llama.cpp/src/vocab.cpp (around line 2570)

auto _try_copy = [=](const char * token, size_t size) -> int32_t {
    // 1) Skip up to `lstrip` leading spaces in the token string.
    for (int32_t i = 0; i < lstrip && size && *token == ' '; ++i) {
        token++;
        size--;
    }

    // 2) Bound check (VULNERABLE):
    //    - `length` is the maximum number of bytes the caller promised `buf` can hold (signed int32_t).
    //    - `size` is the unsigned token length (size_t). If size > INT32_MAX, casting to int32_t overflows
    //      and produces a negative value.
    if (length < (int32_t) size) {
        // Intention: return a negative error code when the token is too large to fit.
        // But when size > INT32_MAX:
        //    (int32_t)size becomes a negative integer (e.g. size_t=2,147,483,648 → (int32_t)=−2,147,483,648).
        //    Then (length < negative) is always false, so this branch is skipped.
        return -(int32_t) size;
    }

    // 3) Unchecked memcpy (VULNERABLE):
    //    At this point, even if `size` is far larger than `length`, the code will reach this memcpy,
    //    because the prior check falsely evaluated to false when (int32_t)size wrapped negative.
    //    This copies `size` bytes into `buf`, overrunning the buffer whenever size > length.
    memcpy(buf, token, size);

    // 4) Return the number of bytes copied (signed).
    //    Note: this cast also overflows if size > INT32_MAX, but the overflow has already happened.
    return (int32_t) size;
};

Why This Check Fails for Extremely Large Tokens:

  1. Unsigned size vs. Signed length:
    • size is size_t (e.g., 64-bit on most platforms).
    • length is int32_t (maximum positive value = 2,147,483,647).
  2. Cast Overflow:
    • If token_text.size() > INT32_MAX, then (int32_t) size wraps into a negative value (two’s-complement). For example:
    size_t size = 2,147,483,648  // one more than INT32_MAX
    (int32_t)size → −2,147,483,648
    • The comparison if (length < (int32_t) size) becomes effectively if (small_positive < large_negative), which is always false.
  3. Unchecked memcpy
    • Because the bound check is bypassed, the code executes memcpy(buf, token, size).
    • Even though buf only has room for length bytes, memcpy uses the full (very large) size, causing a buffer overflow to the tune of billions of bytes.

Callers and Code Paths

Any “token → string” conversion can overflow if token_text.size() > INT32_MAX. Notable call sites include:

  • Model loading (each GGUF token string passes through token_to_piece())
  • Detokenization (llama_vocab::impl::detokenize(...))
  • Grammar routines (llama_grammar_apply_impl, llama_grammar_accept_impl)
  • Sampling & infill (llama_sampler_infill_apply, etc.)
  • Public API (llama_token_to_piece(...))

As soon as llama.cpp loads the oversized token, it will crash with a buffer‐overflow in _try_copy().
image

Impact

  • Vulnerability Type

    • Buffer overflow caused by a signed‐to‐unsigned conversion error in _try_copy().
  • Attack Vector

    • A malicious GGUF model file containing a vocabulary entry whose token_text.size() exceeds INT32_MAX.
    • As soon as llama.cpp attempts any “token → string” conversion (e.g., during model load, detokenization, grammar checks, or sampling), the oversized size_t bypasses the length check and triggers an unchecked memcpy.
  • Affected Component

    • llama_vocab::impl::token_to_piece(), which is invoked by:
      • Grammar routines (llama_grammar_apply_impl, llama_grammar_accept_impl)
      • Sampling/infill code (llama_sampler_infill_apply, etc.)
      • The public API (llama_token_to_piece())
  • Severity

    • Critical – a single malicious token in a GGUF file can immediately corrupt memory or hijack control flow.
  • Consequences

    • Arbitrary Memory Corruption
      • Overwrites heap or stack data, leading to application instability or crashes.
    • Remote Code Execution (RCE)
      • By corrupting adjacent heap metadata, return addresses, or vtable pointers, an attacker can redirect execution flow.
    • Denial of Service (DoS)
      • Immediate crash under sanitizers (ASAN) or undefined behavior in production binaries.
    • Information Disclosure
      • Overwritten memory might reveal sensitive data or internal pointers.
  • Who Is Impacted

    • Any application or service that uses llama.cpp to load GGUF models from untrusted sources.
    • Inference servers, chatbots, or pipelines that dynamically ingest external model files are all at risk.
  • Mitigation & Recommendations

    • Required Patch
      • Modify _try_copy so that length and size are compared in an unsigned context, for example:
        if ((size_t)length < size) {
            return -(int32_t)size;
        }
      • This change ensures size values above INT32_MAX cannot bypass the bound check.

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
Required
Scope
Unchanged
Confidentiality
High
Integrity
High
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CVE ID

CVE-2025-49847

Weaknesses

Improper Restriction of Operations within the Bounds of a Memory Buffer

The product performs operations on a memory buffer, but it can read from or write to a memory location that is outside of the intended boundary of the buffer. Learn more on MITRE.

Signed to Unsigned Conversion Error

The product uses a signed primitive and performs a cast to an unsigned primitive, which can produce an unexpected value if the value of the signed primitive can not be represented using an unsigned primitive. Learn more on MITRE.

Credits