Replies: 1 comment
-
the parallel processing doesn't work with the latest code. wonder if there is one point when it was working. During handling of the above exception, another exception occurred: Traceback (most recent call last): |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am currently using llama Cpp, and I have encountered an issue with running tasks in parallel. I am attempting to run it on 4 separate tasks parallelly, but I have noticed a significant decrease in performance (speed) compared to when running one task. The processing time becomes surprisingly slow, which costs about 2 minutes for 4 questions with codellama-13b-instruct.Q2_K.gguf.
(In fact, I thought only 3 was in parallel processing)
Here is a description of my working environment and configuration:
Single Task Runtime: Average time from request to completion of generation was 29s.
The prompt eval speed was 52.884 tokens/second.
Eval speed was 6.498 tokens/second.
Parallel Processed Task Runtime: 10 requests were submitted. Initially, the first one was addressed.
Once the first one was completely answered, the next three questions were processed and responded to. For the final six questions, the connections were closed.
Note that, judging from the following output, I thought that only questions 2 to 4 were processed in parallel, whereas the first question was not.
Could there be potential issues with the parallel processing feature of the program? Or are there any configuration settings that I might be missing for optimal parallel execution? I would appreciate your guidance to resolve this performance issue.
Thank you
Best regards
Beta Was this translation helpful? Give feedback.
All reactions