How to ensure unadulterated topk=1 sampling? #7590

brandon-lockaby · 2024-05-28T13:33:37Z

brandon-lockaby
May 28, 2024

I'm interested in only the model's outputs without any randomness, repetition penalties or other alterations. With pytorch I'd call model() and then topk(). With llama.cpp it seems I have to use this "ubersampler" with parameters and behaviors encompassing a variety of sampling strategies. To avoid most of this, one approach is to change the sampler order:

./main -m "/home/axyo/dev/LLM/models/Meta-Llama-3-8B-GGUF-v2/Meta-Llama-3-8B.Q4_0.gguf" -n 10 -f ./prompts/test.txt -c 4096 --samplers "top_k" --top_k 1 --repeat-last-n 0 --no-penalize-nl --repeat-penalty 1.0

By taking the nucleus sampling and everything else out of the picture to the extent possible, at this point it seems assured that only repetition penalties remain. After looking around, I hope I've found the combination of parameters that disables all of the repetition penalties: repeat-last-n=0 and no-penalize-nl

But let's say I'm using llama-cpp-python, where you must use the "ubersampler."

output = llm(
    prompt,
    echo=False,
    max_tokens=10,
    stop=["\n", "<|eot_id|>"],
    top_k=1,
    repeat_penalty=1.0,
    #penalize_nl=False, # not exposed 🙁
)

repeat-last-n isn't available, so instead, I have to set repeat_penalty to 1.0 which means "disabled." By the way, what the heck is the meaning of <1 rep penalty? 0.0 seems to force repetition? Anyway 😅

penalize_nl isn't exposed here, but that can be addressed.

What about min_p and typical_p? Do I have to worry about those or anything else to ensure the outputs aren't being altered? I'm not familiar with the rest of the sampling strategies

Answered by brandon-lockaby

May 30, 2024

After playing around a bit from llama-cpp-python, top_k=1 seems to pre-empt attempts to perturb with temperature, min_p and top_p, which is consistent with the default sampler order

sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp

So it seems indeed with the ubersampler all I really have to worry about is penalties. repeat_penalty can be set to 1.0 to disable, and penalize_nl shouldn't matter because there's no repetition penalty. Also tfs is disabled by default, as is mirostat which seems to replace the rest of the chain altogether if enabled

One question remains: Even though min_p and top_p seem to specify probabilities (0-1), temperature (par…

View full answer

brandon-lockaby · 2024-05-30T18:18:04Z

brandon-lockaby
May 30, 2024
Author

After playing around a bit from llama-cpp-python, top_k=1 seems to pre-empt attempts to perturb with temperature, min_p and top_p, which is consistent with the default sampler order

sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp

So it seems indeed with the ubersampler all I really have to worry about is penalties. repeat_penalty can be set to 1.0 to disable, and penalize_nl shouldn't matter because there's no repetition penalty. Also tfs is disabled by default, as is mirostat which seems to replace the rest of the chain altogether if enabled

One question remains: Even though min_p and top_p seem to specify probabilities (0-1), temperature (part of softmax argument) comes last in the sampler order. However, setting temperature=0 seems to also pre-empt top_p and min_p. Is topk=1 more reliable than setting temperature=0 for any reason?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to ensure unadulterated topk=1 sampling? #7590

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to ensure unadulterated topk=1 sampling? #7590

Uh oh!

brandon-lockaby May 28, 2024

Replies: 1 comment

Uh oh!

Uh oh!

brandon-lockaby May 30, 2024 Author

brandon-lockaby
May 28, 2024

brandon-lockaby
May 30, 2024
Author