A sequence repetition penalty sampler - how would you expect it to work? #2581
Closed
KerfuffleV2
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The existing repetition and frequency/presence penalty samplers have their use but one thing they don't really help with is stopping the LLM from repeating a sequence of tokens it's already generated or from the prompt. It seems like adding a way to penalize repeating sequences would be pretty useful.
Just for example, say we have token ids
1, 2, 3, 4, 1, 2, 3
in the context currently. If the LLM generates token4
at this point, it will repeat the sequence1, 2, 3, 4
which already exists in the context. So we could penalize4
to prevent this and to try to force the LLM to take a new path (there are different ways to approach this like a flat penalty, a penalty based on the length of the sequence it would be continuing, etc). Implementing this is pretty simple.We probably also want to allow fuzzy matching. Allowing, a match with
1, 2, 3, 4, 1, 0, 3
to still match like the first example and penalize4
because1, 0, 3, 4
would be sufficiently similar to1, 2, 3, 4
. This is also pretty straightforward. You basically can give the matching algo credits for failed matches. If the tokens don't match, you count as a match anyway if there are credits remaining and decrement the credits. If there are no credits, then you can abort and start trying to match from a new position.Now where it gets really complicated: Suppose in addition to allowing 1:1 fuzzy matching you might want to let the credit apply to more than one consecutive non-matching token. I really haven't figured out a good way to do this or even exactly what results I'd expect. The first issue is if the match fails do we merge consecutive non-matching tokens in the "needle" or the "haystack"? Also, greedily consuming non-matching tokens like that can potentially be worse than doing nothing. You want to consume just enough non-matching tokens such that the next iteration results in a match.
Why should llama.cpp people care about this question? Well, if I can figure out a good answer I intend to write a C version of this sampler and contribute it to the project.
I have some prototype Python code that... does something.
Using these parameters - minimum match length 3, failed match credits 1, allow merging up to 2 consecutive non-matching tokens:
With
1, 6, 6, 3, 1, 2, 3, 4, 1, 2, 3
, this is what I get:1, 6, 6, 3, {1, 2, 3, 4}, [@1, 2, 3]
1, 6, 6, 3, {1, 2, 3}, @4, [1, 2, 3]
1, 6, 6, {3, 1, 2, 3}, [@4, 1, 2, 3]
{1, 6, 6, 3}, @1, 2, 3, 4, [1, 2, 3]
Curly braces around the "haystack" part of the patch,
@
in front of the token that would be penalized as continuing a sequence and square braces around the "needle" part of the match. (Don't know if there's a better way to refer to it: this would be the end of the context where the next generated token might complete a sequence from all generated/prompt tokens).Then with
1, 2, 3, 1, 2, 3, 4, 1, 6, 6, 3
:1, 2, 3, {1, 2, 3}, @4, [1, 6, 6, 3]
{1, 2, 3}, @1, 2, 3, 4, [1, 6, 6, 3]
Here is some horrendous Python code:
If it's not clear what it's doing, each iteration of the outer loop decrements where the "haystack" starts (and we match in reverse order, decrementing toward the start of the sequence). Matching vs the "needle" always starts from the every end. Matching the haystack starts from the penultimate token (since you'd want to match
1, 1, 1, 1
). If there's no match and there are credits remaining, then it tries to find the distance to to a token that can match with both the needle and the haystack and chooses the shortest one if both exist. I.E. If we can merge 2 needle tokens to get a match or 1 haystack token to get a match, the latter will get chosen.Any discussion or tips for known algorithms to perform this kind of matching in a sane way are very welcome. Maybe a version without the merging consecutive non-matches would still be good enough to be worth implementing?
Beta Was this translation helpful? Give feedback.
All reactions