[RFC] common, server : add top-a sampler #5612
Open
+41
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Top-A sampling is an interesting sampling technique that behaves like a dynamic Min-P. More details here: https://github.com/BlinkDL/RWKV-LM/tree/4cb363e5aa31978d801a47bc89d28e927ab6912e#the-top-a-sampling-method
When the most likely token has a high probability, this behaves like min-p with high P. When the most likely token has a low probability, this behaves like min-p with low P. The
a
parameter selects the cutoff point (a*P^2 where P is the probability of the most likely token).This commit adds top-a sampling to llama.cpp and server.cpp, which improves compatibility with clients like AI Horde. Because this sampler works very similarly to min-p, adding it is mostly free in terms of actual code changes.