Pointers for implementing custom token filtering #13638

ahelwer · 2025-05-19T15:40:24Z

ahelwer
May 19, 2025

Hello! I am very impressed by the grammar feature here. I am looking to extend the capabilities of the grammar-based token filter. As I understand it, the LLM produces a list of possible next tokens at each step and the grammar is used to filter them before one is randomly selected. I was wondering whether it is at all feasible to inject my own arbitrarily-complicated program at this layer, for example restricting token generation to semantically valid next tokens, support restricting tokens during diff-edits to files for agentic models, etc. I understand this is unlikely to be a feature intended to be exposed to most users so assume some amount of forking & hackery is required to get it to work, which is fine.

I would be very grateful for any pointers into the codebase to where token filtering takes place, the grammar evaluator lives, etc. Alternatively, if you think Guidance would already support all of my possible needs I will just go experiment with that instead.

Thank you for your time!

Answered by slaren

May 19, 2025

There is an interface that you can implement to add custom samplers. See:

llama.cpp/include/llama.h

Lines 1193 to 1209 in 8960efd

     // user code can implement the interface below in order to create custom llama_sampler  
   struct llama_sampler_i {  
   const char * (*name) (const struct llama_sampler * smpl); // can be NULL  
   void (*accept)( struct llama_sampler * smpl, llama_token token); // can be NULL  
   void (*apply) ( struct llama_sampler * smpl, llama_token_data_array * cur_p); // required  
   void (*reset) ( struct llama_sampler * smpl); // can be NULL  
   struct llama_sampler * (*clone) (const struct llama_sampler * smpl); // can be NULL if ctx is NULL  
   void …

View full answer

slaren · 2025-05-19T16:05:11Z

slaren
May 19, 2025
Maintainer

There is an interface that you can implement to add custom samplers. See:

llama.cpp/include/llama.h

Lines 1193 to 1209 in 8960efd

    
           // user code can implement the interface below in order to create custom llama_sampler 
        
           struct llama_sampler_i { 
        
               const char *           (*name)  (const struct llama_sampler * smpl);                                 // can be NULL 
        
               void                   (*accept)(      struct llama_sampler * smpl, llama_token token);              // can be NULL 
        
               void                   (*apply) (      struct llama_sampler * smpl, llama_token_data_array * cur_p); // required 
        
               void                   (*reset) (      struct llama_sampler * smpl);                                 // can be NULL 
        
               struct llama_sampler * (*clone) (const struct llama_sampler * smpl);                                 // can be NULL if ctx is NULL 
        
               void                   (*free)  (      struct llama_sampler * smpl);                                 // can be NULL if ctx is NULL 
        
               // TODO: API for internal libllama usage for appending the sampling to an existing ggml_cgraph 
        
               //void (*apply_ggml) (struct llama_sampler * smpl, ...); 
        
           }; 
        
           struct llama_sampler { 
        
               const struct llama_sampler_i * iface; 
        
               llama_sampler_context_t        ctx; 
        
           };

https://github.com/ggml-org/llama.cpp/blob/master/src/llama-sampling.h
https://github.com/ggml-org/llama.cpp/blob/master/src/llama-sampling.cpp

1 reply

ahelwer May 19, 2025
Author

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pointers for implementing custom token filtering #13638

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

	// user code can implement the interface below in order to create custom llama_sampler
	struct llama_sampler_i {
	const char * (name) (const struct llama_sampler smpl); // can be NULL
	void (accept)( struct llama_sampler smpl, llama_token token); // can be NULL
	void (apply) ( struct llama_sampler smpl, llama_token_data_array * cur_p); // required
	void (reset) ( struct llama_sampler smpl); // can be NULL
	struct llama_sampler * (clone) (const struct llama_sampler smpl); // can be NULL if ctx is NULL
	void …

Pointers for implementing custom token filtering #13638

Uh oh!

ahelwer May 19, 2025

Replies: 1 comment · 1 reply

Uh oh!

slaren May 19, 2025 Maintainer

Uh oh!

ahelwer May 19, 2025 Author

ahelwer
May 19, 2025

Replies: 1 comment 1 reply

slaren
May 19, 2025
Maintainer

ahelwer May 19, 2025
Author