Custom sampler / Logits processor / Token filterer? #3665

noamgat · 2023-10-18T08:41:04Z

noamgat
Oct 18, 2023

Hi,

(Context: I'm looking to integrate this library: https://github.com/noamgat/lm-format-enforcer with llama.cpp)

I am looking to customize the sampling procedure used by llama.cpp in order to integrate a postprocessing step on the logits, to enforce certain properties of the output token stream.

For example, huggingface transformers have this optional parameter:

prefix_allowed_tokens_fn: Callable[[int, torch.Tensor], List[int]]

parameter when calling generate().

Is this achievable with llama.cpp? There are a number of ways to approach this, and I wonder if any API is open to do such a thing.

Answered by noamgat

Oct 19, 2023

Update - Llama.cpp's python bindings ( https://github.com/abetlen/llama-cpp-python ) have a LogitsProcessor interface that fit my usecase since I was using its python bindings.

This allowed me to create a very straightforward integration between my library and llama-cpp-python:

https://github.com/noamgat/lm-format-enforcer/blob/main/samples/colab_llamacpppython_integration.ipynb

View full answer

kalomaze · 2023-10-18T11:39:36Z

kalomaze
Oct 18, 2023

...custom GBNF grammar?

1 reply

noamgat Oct 18, 2023
Author

GBNF is not strong enough because it is context free grammar, whereas some of the parsers operate using context (for example, when parsing a json object that contains the properties foo and bar, if foo was already given, then bar is the only allowed next key)

KerfuffleV2 · 2023-10-18T20:53:33Z

KerfuffleV2
Oct 18, 2023
Collaborator

You can just fetch the logits and do whatever you want with them before you call the sampling functions (or you can handle the sampling yourself, whether from scratch or copying the existing sampling functions). Check out: https://github.com/ggerganov/llama.cpp/blob/004797f6ac135383f8c1d1f5bd415ddee2f79318/llama.h#L478-L483

Also common/sampling.cpp in the repo.

1 reply

noamgat Oct 18, 2023
Author

Thanks! This looks very relevant.
Also found this sample in the python repo:
https://github.com/abetlen/llama-cpp-python/blob/d989ac86e6eeb3b5105cad155cd99a1987583abd/examples/low_level_api/low_level_api_chat_cpp.py#L331

I should be able to achieve what I want with it.

noamgat · 2023-10-19T10:37:31Z

noamgat
Oct 19, 2023
Author

Update - Llama.cpp's python bindings ( https://github.com/abetlen/llama-cpp-python ) have a LogitsProcessor interface that fit my usecase since I was using its python bindings.

This allowed me to create a very straightforward integration between my library and llama-cpp-python:

https://github.com/noamgat/lm-format-enforcer/blob/main/samples/colab_llamacpppython_integration.ipynb

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom sampler / Logits processor / Token filterer? #3665

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Custom sampler / Logits processor / Token filterer? #3665

Uh oh!

noamgat Oct 18, 2023

Replies: 3 comments · 2 replies

Uh oh!

kalomaze Oct 18, 2023

Uh oh!

noamgat Oct 18, 2023 Author

Uh oh!

KerfuffleV2 Oct 18, 2023 Collaborator

Uh oh!

noamgat Oct 18, 2023 Author

Uh oh!

noamgat Oct 19, 2023 Author

noamgat
Oct 18, 2023

Replies: 3 comments 2 replies

kalomaze
Oct 18, 2023

noamgat Oct 18, 2023
Author

KerfuffleV2
Oct 18, 2023
Collaborator

noamgat Oct 18, 2023
Author

noamgat
Oct 19, 2023
Author