Errors encountered during API calls while running DeepSeek R1:671b in multi-GPU mode(RTX4090*2) #968
Closed
lililolo0927
started this conversation in
General
Replies: 1 comment 7 replies
-
Seems like cudagraph err? Try to update codebase and use DeepSeek-V3-Chat-multi-gpu-marlin.yaml instead, it should provide better performance.
Ktransformers use cpu to calculate the expert layer. So if the VRAM on single card is enough, add GPUs will not increase speed. |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello
I am using a container where Ktransformers version0.2.1 is running as API server (referring to https://github.com/ubergarm/r1-ktransformers-guide) .
I launched the server to run DeepSeek-R1:671b(Q4) with two gpus (4090) with the following command,
Before using the multi-GPU option, the request was executed successfully with the following curl command:
After adding the multi-GPU option, while nvidia-smi shows that these processes are distributed across two GPUs(as shown on the above capture) , the following error continues to occur.
and the below error in the server side ,
Why does it happend and what should I do to use two gpus ?
In addition, the 2nd question is , as you can see in the following captures (running server on the single gpu and two gpus, repectively) ,


it is not really using full GPU while still speed is very slow.. (I thought the slowing down is due to the lack of VRAM?) , why only the limited amount of VRAM (almost fixed quantity) is consumed for running model?
Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions