Feature Request: Improve Sampling API: Expose Top‑K/Top‑P Candidate Token Lists in C API

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Currently, logprobs via the OpenAI-style wrapper allows users to see probabilities of generated tokens: great for research & debugging. However, there is no straightforward way at the native C API / CLI level to access the full list of candidate tokens and their probabilities, especially before sampling decisions (e.g., top‑K or nucleus sampling options).

Having direct access to these candidate distributions would:

Enable confidence-based stopping criteria

Facilitate custom sampling / selective decoding in application code

Provide better transparency into internal generation decisions



### Motivation

This feature empowers developers to:

- Inspect model confidence before outputting tokens

- Implement advanced sampling like dynamic beam filtering

- Writing more explainable LLM-based systems


While logprobs support exists in the wrapper, exposing candidate distributions natively ensures broader accessibility (via CLI, C API, or other bindings).

### Possible Implementation




Extend `llama_sample_token` / `llama_sample_token_greedy` (or create variants) to return a struct containing:

- `token_id`

- `logit` (or prob after softmax)

- `is_selected` flag


Add equivalent CLI flags (e.g. `--print-topk 10`)

Expose the functionality in Python/C bindings consistent with high-level logprobs usage

Benchmark to ensure no significant inference slowdowns when the feature is inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Improve Sampling API: Expose Top‑K/Top‑P Candidate Token Lists in C API #14612

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Improve Sampling API: Expose Top‑K/Top‑P Candidate Token Lists in C API #14612

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions