A sequence repetition penalty sampler - how would you expect it to work? #2581

KerfuffleV2 · 2023-08-10T22:13:34Z

KerfuffleV2
Aug 10, 2023
Collaborator

The existing repetition and frequency/presence penalty samplers have their use but one thing they don't really help with is stopping the LLM from repeating a sequence of tokens it's already generated or from the prompt. It seems like adding a way to penalize repeating sequences would be pretty useful.

Just for example, say we have token ids 1, 2, 3, 4, 1, 2, 3 in the context currently. If the LLM generates token 4 at this point, it will repeat the sequence 1, 2, 3, 4 which already exists in the context. So we could penalize 4 to prevent this and to try to force the LLM to take a new path (there are different ways to approach this like a flat penalty, a penalty based on the length of the sequence it would be continuing, etc). Implementing this is pretty simple.

We probably also want to allow fuzzy matching. Allowing, a match with 1, 2, 3, 4, 1, 0, 3 to still match like the first example and penalize 4 because 1, 0, 3, 4 would be sufficiently similar to 1, 2, 3, 4. This is also pretty straightforward. You basically can give the matching algo credits for failed matches. If the tokens don't match, you count as a match anyway if there are credits remaining and decrement the credits. If there are no credits, then you can abort and start trying to match from a new position.

Now where it gets really complicated: Suppose in addition to allowing 1:1 fuzzy matching you might want to let the credit apply to more than one consecutive non-matching token. I really haven't figured out a good way to do this or even exactly what results I'd expect. The first issue is if the match fails do we merge consecutive non-matching tokens in the "needle" or the "haystack"? Also, greedily consuming non-matching tokens like that can potentially be worse than doing nothing. You want to consume just enough non-matching tokens such that the next iteration results in a match.

Why should llama.cpp people care about this question? Well, if I can figure out a good answer I intend to write a C version of this sampler and contribute it to the project.

I have some prototype Python code that... does something.

Using these parameters - minimum match length 3, failed match credits 1, allow merging up to 2 consecutive non-matching tokens:

With 1, 6, 6, 3, 1, 2, 3, 4, 1, 2, 3, this is what I get:

1, 6, 6, 3, {1, 2, 3, 4}, [@1, 2, 3]
1, 6, 6, 3, {1, 2, 3}, @4, [1, 2, 3]
1, 6, 6, {3, 1, 2, 3}, [@4, 1, 2, 3]
{1, 6, 6, 3}, @1, 2, 3, 4, [1, 2, 3]

Curly braces around the "haystack" part of the patch, @ in front of the token that would be penalized as continuing a sequence and square braces around the "needle" part of the match. (Don't know if there's a better way to refer to it: this would be the end of the context where the next generated token might complete a sequence from all generated/prompt tokens).

Then with 1, 2, 3, 1, 2, 3, 4, 1, 6, 6, 3:

1, 2, 3, {1, 2, 3}, @4, [1, 6, 6, 3]
{1, 2, 3}, @1, 2, 3, 4, [1, 6, 6, 3]

Here is some horrendous Python code:

def m4(l, min_len = 3, tolerance = 0, merge = 0):
  print(f'({min_len}) -- {l}\n')
  llen = len(l)
  if llen < min_len + 1:
    return []
  sstart = llen - 2
  result = []
  while sstart >= min_len - 1:
    currtol = tolerance
    nidx = llen - 1
    hidx = sstart
    sstart -= 1
    good = 0
    print('===========')
    while nidx >= 0 and hidx >= 0:
      ok = l[nidx] == l[hidx]
      print(f'  [{good:>2}] {nidx:>2}, {hidx:>2}  |  {l[nidx]} {("==" if ok else "!=")} {l[hidx]}')
      if ok:
        seekn = 1
        seekh = 1
      elif currtol < 1:
        print('  !!!')
        break
      elif merge > 0:
        currtol -= 1
        snidx = nidx - 1
        shidx = hidx - 1
        foundn = None
        foundh = None
        if hidx > 0:
          for dist in range(1, merge + 2):
            if nidx - dist < 0 or nidx - dist <= hidx - 1 :
              break
            if l[nidx - dist] == l[hidx - 1]:
              foundn = dist
        if nidx > 0:
          for dist in range(1, merge + 2):
            if hidx - dist < 0:
              break
            if l[hidx - dist] == l[nidx - 1]:
              foundh = dist
        print(f'   ?? seek - n: {foundn or "!"}, h: {foundh or "!"}')
        if foundn is not None and foundh is not None:
          if foundn < foundh:
            seekn = foundn
            seekh = 1
          else:
            seekn = 1
            seekh = foundh
        else:
          seekn = 1 if foundn is None else foundn
          seekh = 1 if foundh is None else foundh
      good += 1
      if good >= min_len:
        rn = l[nidx - (seekn - 1):]
        rh = l[hidx - (seekh - 1):sstart + 2]
        result.append((rh, l[sstart + 2], rn))
        print('  ^^^^ H:', rh, '@', l[sstart + 2])
        print('  ^^^^ N:', rn)
        print()
      nidx -= seekn
      hidx -= seekh
  return result

result = m4([1, 6, 6, 3, 1, 2, 3, 4, 1, 2, 3], 3, 1, 1)
print(f'Got: {result}')
print('\n\n###################\n')
result = m4([1, 2, 3, 1, 2, 3, 4, 1, 6, 6, 3], 3, 1, 1)
print(f'Got: {result}')

If it's not clear what it's doing, each iteration of the outer loop decrements where the "haystack" starts (and we match in reverse order, decrementing toward the start of the sequence). Matching vs the "needle" always starts from the every end. Matching the haystack starts from the penultimate token (since you'd want to match 1, 1, 1, 1). If there's no match and there are credits remaining, then it tries to find the distance to to a token that can match with both the needle and the haystack and chooses the shortest one if both exist. I.E. If we can merge 2 needle tokens to get a match or 1 haystack token to get a match, the latter will get chosen.

Any discussion or tips for known algorithms to perform this kind of matching in a sane way are very welcome. Maybe a version without the merging consecutive non-matches would still be good enough to be worth implementing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A sequence repetition penalty sampler - how would you expect it to work? #2581

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

A sequence repetition penalty sampler - how would you expect it to work? #2581

Uh oh!

KerfuffleV2 Aug 10, 2023 Collaborator

Replies: 0 comments

KerfuffleV2
Aug 10, 2023
Collaborator