Skip to content

Question about server.cpp: loading prompt tokens, batch_view shrink, sampled token #6449

Answered by ggerganov
TD-Sky asked this question in Q&A
Discussion options

You must be logged in to vote

However, it won't call llama_sampling_sample if the last token whose logits set to true not in batch_view:

I don't think this can ever happen since we always process the entire batch (in chunks/views of n_batch):

https://github.com/ggerganov/llama.cpp/blob/4399f13fb9462cd06f3f154d0aee738425000fea/examples/server/server.cpp#L2033-L2037

...

https://github.com/ggerganov/llama.cpp/blob/4399f13fb9462cd06f3f154d0aee738425000fea/examples/server/server.cpp#L2066-L2079

Each token with logits == true should fall in one of the batch views and will be processed.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@TD-Sky
Comment options

@phymbert
Comment options

phymbert Apr 3, 2024
Collaborator

Comment options

You must be logged in to vote
0 replies
Answer selected by TD-Sky
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants