Choose best answer based on perplexity with parallel generation? #3533

Mihaiii · 2023-10-07T19:01:49Z

Mihaiii
Oct 7, 2023

I have a practical use case where I want my LLM to generate multiple responses to the same prompt and chose the best one.

Now, given the latest developments, is it possible to do all this while approximately keeping the same generation time as if I only make one sampling call?

I'm thinking of parallel generation of 3-4 answers and calculating the perplexity of all of them without making any other extra call. Are there any other ideas?

If it's possible, is there an example that already does this? I'm convinced that if it's possible, other people are also highly interested.

UPDATE: I have a very long sys prompt and I already tried different sampling strategies (ex: mirostat) and I couldn't manage to find a better combo than just using default configs. For example, mirostat tries really hard to follow the beginning of the sys prompt, but it ignores later sys text where I specifically say not to generate the type of responses it generates. For some reason, if I just run it with default configs, it "listens" to the whole sys prompt and is usually better than changing top_p, top_k, tfs etc with values that should theoretically work better.

Answered by KerfuffleV2

Oct 7, 2023

but I was told that perplexity calculation is based on an external text source

Right, you feed it wikitext and (from what I know) perplexity is based on how accurately it predicts what is actually in that text. When you're just generating stuff though, you don't have a reference to compare the token it predict with. So you can't say if it got the right answer or not.

@Mihaiii

I'm thinking of parallel generation of 3-4 answers and calculating the perplexity of all of them

So you probably can't really do this. However you can possibly look at the code for the mirostat samplers. They calculate a surprise value for the token they picked and also keep track of mu which gets updated based o…

View full answer

kalomaze · 2023-10-07T22:16:03Z

kalomaze
Oct 7, 2023

I had this idea as well, but I was told that perplexity calculation is based on an external text source and is computationally expensive, and therefore wouldnt be worth it at least without optimizations. Idk how true that is

1 reply

KerfuffleV2 Oct 7, 2023
Collaborator

but I was told that perplexity calculation is based on an external text source

Right, you feed it wikitext and (from what I know) perplexity is based on how accurately it predicts what is actually in that text. When you're just generating stuff though, you don't have a reference to compare the token it predict with. So you can't say if it got the right answer or not.

@Mihaiii

I'm thinking of parallel generation of 3-4 answers and calculating the perplexity of all of them

So you probably can't really do this. However you can possibly look at the code for the mirostat samplers. They calculate a surprise value for the token they picked and also keep track of mu which gets updated based on the learning rate and that value, to try to pick tokens that are at the target entropy (he says like he has some idea of what that all means).

What you could possibly do is use some of that logic (you might have to modify the samplers to make it accessible outside of the sampling function) and then choose the sequence with that was the closest to the target for mu, or maybe the one where the selected token had the lowest surprise value. Then you can throw the other sequences away.

Actually, come to think of it I am pretty sure mirostat is broken for parallel generation right now because it uses a static variable to hold mu. So that'll be shared by all sequences. Once that's fixed you could do what I'm talking about though. (I made #3537 - may end up fixing this myself once the way to do so is more clear.)

Answer selected by Mihaiii

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose best answer based on perplexity with parallel generation? #3533

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Choose best answer based on perplexity with parallel generation? #3533

Uh oh!

Uh oh!

Mihaiii Oct 7, 2023

Replies: 1 comment · 1 reply

Uh oh!

kalomaze Oct 7, 2023

Uh oh!

Uh oh!

KerfuffleV2 Oct 7, 2023 Collaborator

Mihaiii
Oct 7, 2023

Replies: 1 comment 1 reply

kalomaze
Oct 7, 2023

KerfuffleV2 Oct 7, 2023
Collaborator