Skip to content

llama-parallel is processing only one request, when run with various parameters #8388

Answered by skprasadu
skprasadu asked this question in Q&A
Discussion options

You must be logged in to vote

I think I figured out, this is the right command,
./llama-parallel --prompt "where is bangalore\nwho is lord krishna" --parallel 2 --cont-batching --sequences 2 -ntg 1024 -npp 1024,1024 -ngl 35 actually we need to give the sequence size equal to the number of prompts that need to be processed.

Hope it will be useful to others.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@ggerganov
Comment options

Answer selected by ggerganov
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants