-
I'm been trying to figure out the reason for the following code and comment in examples/main/main.cpp: while ((n_remain != 0 && !is_antiprompt) || params.interactive) {
// predict
if (!embd.empty()) {
// Note: (n_ctx - 4) here is to match the logic for commandline prompt handling via
// --prompt or --file which uses the same value.
int max_embd_size = n_ctx - 4; I've looked in common/common.cpp and the parsing of command line argument but I've failed to see the reason for this subtraction. Hopefully someone is able to explain this to me. Thanks |
Beta Was this translation helpful? Give feedback.
Answered by
ggerganov
May 9, 2024
Replies: 1 comment 1 reply
-
It refers to this code: The |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
danbev
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It refers to this code:
https://github.com/ggerganov/llama.cpp/blob/07cd41d0965829463eff73eda3348aedbfd3a444/examples/main/main.cpp#L291-L296
The
n_ctx - 4
is arbitrary - the goal is to leave at least some context for the generation because if the prompt fills the entire context then we can't generate new tokens