How does the llama.cpp do batch processing? #4371
Unanswered
xyzhang626
asked this question in
Q&A
Replies: 1 comment
-
If you just want to do batch decoding, see https://github.com/ggerganov/llama.cpp/tree/master/examples/parallel |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi guys,
I'm new to the llama.cpp and ggml, I want to understand how the code does batch processing.
I saw lines like
ggml_reshape_3d(ctx0, Kcur, n_embd_head, n_head_kv, n_tokens)
inbuild_llama
, where no batch dim is considered. Could you guys help me to understand how the model forward with batch input? That will help me a lot, thanks in advance! @ggerganovBeta Was this translation helpful? Give feedback.
All reactions