How to handle rewinding with the grammar sampler? #3211
-
Sorry to bug you with the mention (also anyone else who knows the answer is welcome to reply as well). I've been adding backtracking support in my seqrep project (over here: #2593) and if there's a reasonable way to accomplish it, I'd like to be able to support grammar sampling also. By rewind, I mean undo some number of tokens and restart generation from an earlier point. The naive way to handle it would probably be to reset the grammar sampler state somehow and then feed it tokens starting from the very beginning up to the rewind part. This is likely to be pretty slow though. Another possible approach would be to save the grammar state at each step and just reload it when rewinding back to that point. That also might be kind of slow and memory-intensive. Any ideas for a better approach? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hey, yeah, #2593 looks really neat! Both of those approaches seem valid. I suppose you could also blend the two and checkpoint the grammar state every N tokens. For the copying, I don't know if you saw there's now a |
Beta Was this translation helpful? Give feedback.
Hey, yeah, #2593 looks really neat! Both of those approaches seem valid. I suppose you could also blend the two and checkpoint the grammar state every N tokens.
For the copying, I don't know if you saw there's now a
llama_grammar_copy
API that would support that. Currently it copies the rules and has to relocate pointers, but we've discussed that we can avoid this with a shared reference. Once that's done,llama_grammar_copy
will just be copying the stacks, which is done multiple times on each sample anyway. So I believe it should not really be too bad in terms of run time; memory, not sure.