Skip to content

Does llama-parallel executes the llama_decode in serial for np=2 or more? #10017

Closed Answered by ggerganov
abhishek-rn asked this question in Q&A
Discussion options

You must be logged in to vote

Parallel sequences are stacked into the same batch (as long as they fit in it) so a single llama_decode can process arbitrary number of parallel sequences.

The printf is displayed out of order, because all other logs are printed asynchronously in a different thread, so it is difficult to analyze the output. Change the printf to LLAMA_LOG function for correct output.

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by abhishek-rn
Comment options

You must be logged in to vote
1 reply
@ggerganov
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants