Choose best answer based on perplexity with parallel generation? #3533
-
I have a practical use case where I want my LLM to generate multiple responses to the same prompt and chose the best one. Now, given the latest developments, is it possible to do all this while approximately keeping the same generation time as if I only make one sampling call? I'm thinking of parallel generation of 3-4 answers and calculating the perplexity of all of them without making any other extra call. Are there any other ideas? If it's possible, is there an example that already does this? I'm convinced that if it's possible, other people are also highly interested. UPDATE: I have a very long sys prompt and I already tried different sampling strategies (ex: mirostat) and I couldn't manage to find a better combo than just using default configs. For example, mirostat tries really hard to follow the beginning of the sys prompt, but it ignores later sys text where I specifically say not to generate the type of responses it generates. For some reason, if I just run it with default configs, it "listens" to the whole sys prompt and is usually better than changing top_p, top_k, tfs etc with values that should theoretically work better. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I had this idea as well, but I was told that perplexity calculation is based on an external text source and is computationally expensive, and therefore wouldnt be worth it at least without optimizations. Idk how true that is |
Beta Was this translation helpful? Give feedback.
Right, you feed it wikitext and (from what I know) perplexity is based on how accurately it predicts what is actually in that text. When you're just generating stuff though, you don't have a reference to compare the token it predict with. So you can't say if it got the right answer or not.
@Mihaiii
So you probably can't really do this. However you can possibly look at the code for the mirostat samplers. They calculate a surprise value for the token they picked and also keep track of
mu
which gets updated based o…