-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Currently, logprobs via the OpenAI-style wrapper allows users to see probabilities of generated tokens: great for research & debugging. However, there is no straightforward way at the native C API / CLI level to access the full list of candidate tokens and their probabilities, especially before sampling decisions (e.g., top‑K or nucleus sampling options).
Having direct access to these candidate distributions would:
Enable confidence-based stopping criteria
Facilitate custom sampling / selective decoding in application code
Provide better transparency into internal generation decisions
Motivation
This feature empowers developers to:
-
Inspect model confidence before outputting tokens
-
Implement advanced sampling like dynamic beam filtering
-
Writing more explainable LLM-based systems
While logprobs support exists in the wrapper, exposing candidate distributions natively ensures broader accessibility (via CLI, C API, or other bindings).
Possible Implementation
Extend llama_sample_token
/ llama_sample_token_greedy
(or create variants) to return a struct containing:
-
token_id
-
logit
(or prob after softmax) -
is_selected
flag
Add equivalent CLI flags (e.g. --print-topk 10
)
Expose the functionality in Python/C bindings consistent with high-level logprobs usage
Benchmark to ensure no significant inference slowdowns when the feature is inactive