Does llama-parallel executes the llama_decode in serial for np=2 or more? #10017
-
Hi,
The output returns:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Parallel sequences are stacked into the same batch (as long as they fit in it) so a single The |
Beta Was this translation helpful? Give feedback.
-
Thank you for the response. |
Beta Was this translation helpful? Give feedback.
-
Thank you. |
Beta Was this translation helpful? Give feedback.
Parallel sequences are stacked into the same batch (as long as they fit in it) so a single
llama_decode
can process arbitrary number of parallel sequences.The
printf
is displayed out of order, because all other logs are printed asynchronously in a different thread, so it is difficult to analyze the output. Change theprintf
toLLAMA_LOG
function for correct output.