When trying to scale to two nodes, I get "Error ignored in is_in_the_same_node: [../third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:133] " #6585
carljones3000
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi I'm try to test scaling to two nodes.
I have output from two tests here, both of them lead to the "Error ignored in is_in_the_same_node: [../third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:133] "
Test 1:
This is trying to run on two nodes, 4 3090 GPU's each, 8 GPU's total.
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-32B-Chat --tensor-parallel-size 8 --max-model-len 12591 --swap-space 0
Test 2:
This is trying to run on two nodes, 1 3090 GPU each, 2 GPU's total.
python -m vllm.entrypoints.openai.api_server --model facebook/opt-13b --tensor-parallel-size 2
Any help would be appreciated.
-Carl
Test 1:
Test 2:
Beta Was this translation helpful? Give feedback.
All reactions