-
Hello! I am very impressed by the grammar feature here. I am looking to extend the capabilities of the grammar-based token filter. As I understand it, the LLM produces a list of possible next tokens at each step and the grammar is used to filter them before one is randomly selected. I was wondering whether it is at all feasible to inject my own arbitrarily-complicated program at this layer, for example restricting token generation to semantically valid next tokens, support restricting tokens during diff-edits to files for agentic models, etc. I understand this is unlikely to be a feature intended to be exposed to most users so assume some amount of forking & hackery is required to get it to work, which is fine. I would be very grateful for any pointers into the codebase to where token filtering takes place, the grammar evaluator lives, etc. Alternatively, if you think Guidance would already support all of my possible needs I will just go experiment with that instead. Thank you for your time! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
There is an interface that you can implement to add custom samplers. See: Lines 1193 to 1209 in 8960efd https://github.com/ggml-org/llama.cpp/blob/master/src/llama-sampling.h https://github.com/ggml-org/llama.cpp/blob/master/src/llama-sampling.cpp |
Beta Was this translation helpful? Give feedback.
There is an interface that you can implement to add custom samplers. See:
llama.cpp/include/llama.h
Lines 1193 to 1209 in 8960efd