Parallel sampling in processing the batch of tokens? #11882
Unanswered
whitezhang
asked this question in
Q&A
Replies: 1 comment 2 replies
-
The requests are already processed in parallel - there is nothing extra necessary to enable this. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
When I use the following command to start the server:
I found that the time cost for each query is quite high. I checked the code and found that it processes each slot serially. Is it possible to make this parallel? I can do this if it is feasible. Or is there anything else I haven’t considered?
Here’s a simplified conceptual code snippet that I want to change to be
Beta Was this translation helpful? Give feedback.
All reactions