-
I'm trying to build a Rust integration with llama.cpp, but I'm having an issue where everything works fine except that the BOS token (the second value, with index of This causes it to always be picked with greedy sampling, which causes the model to never be able to complete anything because BOS is always empty. I know I could probably manually exclude this token or use some higher-quality sampling or something, but I've been using simple.cpp as a reference, and I see absolutely no code in there that looks like it could be accounting for this sort of issue. Is it known why/how this happens and what is the best way to prevent / work around it? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I found the issue. I was constructing a The solution was to use |
Beta Was this translation helpful? Give feedback.
I found the issue. I was constructing a
llama_context_params
manually, and left zero all of the parameters that I didn't know the purpose of. Well, turns out that some of those parameters have default values that are not zero, so setting them to zero confused llama.cpp.The solution was to use
llama_context_default_params()
and then only set the individual fields that I actually need to change.