-
Hi, (Context: I'm looking to integrate this library: https://github.com/noamgat/lm-format-enforcer with llama.cpp) I am looking to customize the sampling procedure used by llama.cpp in order to integrate a postprocessing step on the logits, to enforce certain properties of the output token stream. For example, huggingface transformers have this optional parameter:
parameter when calling generate(). Is this achievable with llama.cpp? There are a number of ways to approach this, and I wonder if any API is open to do such a thing. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
...custom GBNF grammar? |
Beta Was this translation helpful? Give feedback.
-
You can just fetch the logits and do whatever you want with them before you call the sampling functions (or you can handle the sampling yourself, whether from scratch or copying the existing sampling functions). Check out: https://github.com/ggerganov/llama.cpp/blob/004797f6ac135383f8c1d1f5bd415ddee2f79318/llama.h#L478-L483 Also |
Beta Was this translation helpful? Give feedback.
-
Update - Llama.cpp's python bindings ( https://github.com/abetlen/llama-cpp-python ) have a This allowed me to create a very straightforward integration between my library and llama-cpp-python: |
Beta Was this translation helpful? Give feedback.
Update - Llama.cpp's python bindings ( https://github.com/abetlen/llama-cpp-python ) have a
LogitsProcessor
interface that fit my usecase since I was using its python bindings.This allowed me to create a very straightforward integration between my library and llama-cpp-python:
https://github.com/noamgat/lm-format-enforcer/blob/main/samples/colab_llamacpppython_integration.ipynb