Skip to content

Misc. bug: llama-server embedding endpoint returns vectors with just null values after a while #14812

@ngladitz

Description

@ngladitz

Name and Version

/opt/homebrew/bin/llama-server --version
version: 5920 (d9b6910)
built with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server

Command line

/opt/homebrew/bin/llama-server -m Qwen3-Embedding-8B-Q4_K_M.gguf --alias Qwen3-embedding --embedding --pooling last -ub 8192 --verbose-prompt --offline  -c 40960 --no-mmap --mlock --port 9008

Problem description & steps to reproduce

I successfully generate working embeddings via the server for a while (works for hours or days maybe; maybe one embedding is being requested per minute) but after a while the embedding vectors start being returned with just null elements. I see no errors or indicators in the log output when this happens and I need to restart the server to recover.

When the server is in the error state (I omitted the repetitive middle of the vector in the response):

% curl -X POST http://localhost:9008/v1/embeddings \
     -H "Content-Type: application/json" \
     -d '{"input": "test"}'

{"model":"gpt-3.5-turbo","object":"list","usage":{"prompt_tokens":2,"total_tokens":2},"data":[{"embedding":[null, ... ,null],"index":0,"object":"embedding"}]}%

Repeating the query after process restart:

% curl -X POST http://localhost:9008/v1/embeddings \
     -H "Content-Type: application/json" \
     -d '{"input": "test"}'

{"model":"gpt-3.5-turbo","object":"list","usage":{"prompt_tokens":2,"total_tokens":2},"data":[{"embedding":[0.027558811008930206, ..., ,0.021016428247094154],"index":0,"object":"embedding"}]}%

I am currently unsure how to reproduce / reduce this or how to come up with a usable test case.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions