How does the llama.cpp do batch processing? #4371

xyzhang626 · 2023-12-08T03:54:54Z

xyzhang626
Dec 8, 2023

Hi guys,

I'm new to the llama.cpp and ggml, I want to understand how the code does batch processing.

I saw lines like ggml_reshape_3d(ctx0, Kcur, n_embd_head, n_head_kv, n_tokens) in build_llama, where no batch dim is considered. Could you guys help me to understand how the model forward with batch input? That will help me a lot, thanks in advance! @ggerganov

bullno1 · 2023-12-12T17:11:28Z

bullno1
Dec 12, 2023

If you just want to do batch decoding, see https://github.com/ggerganov/llama.cpp/tree/master/examples/parallel

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does the llama.cpp do batch processing? #4371

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How does the llama.cpp do batch processing? #4371

Uh oh!

Uh oh!

xyzhang626 Dec 8, 2023

Replies: 1 comment

Uh oh!

bullno1 Dec 12, 2023

xyzhang626
Dec 8, 2023

bullno1
Dec 12, 2023