How to ensure unadulterated topk=1 sampling? #7590
-
I'm interested in only the model's outputs without any randomness, repetition penalties or other alterations. With pytorch I'd call model() and then topk(). With llama.cpp it seems I have to use this "ubersampler" with parameters and behaviors encompassing a variety of sampling strategies. To avoid most of this, one approach is to change the sampler order:
By taking the nucleus sampling and everything else out of the picture to the extent possible, at this point it seems assured that only repetition penalties remain. After looking around, I hope I've found the combination of parameters that disables all of the repetition penalties: But let's say I'm using llama-cpp-python, where you must use the "ubersampler."
What about |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
After playing around a bit from llama-cpp-python,
So it seems indeed with the ubersampler all I really have to worry about is penalties. One question remains: Even though |
Beta Was this translation helpful? Give feedback.
After playing around a bit from llama-cpp-python,
top_k=1
seems to pre-empt attempts to perturb withtemperature
,min_p
andtop_p
, which is consistent with the default sampler orderSo it seems indeed with the ubersampler all I really have to worry about is penalties.
repeat_penalty
can be set to 1.0 to disable, andpenalize_nl
shouldn't matter because there's no repetition penalty. Alsotfs
is disabled by default, as is mirostat which seems to replace the rest of the chain altogether if enabledOne question remains: Even though
min_p
andtop_p
seem to specify probabilities (0-1),temperature
(par…