sock.c:344 UCX ERROR recv(fd=47) failed: Connection reset by peer #8911
Unanswered
smallriver666
asked this question in
Q&A
Replies: 1 comment
-
can you pls add |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I encountered this problem when submitting tasks with multiple nodes. I have a management node and five computing nodes. Nodes 2 to 5 are the same server. Node 6 is not the same server as the other four nodes. I am on 2345 four Tasks can be successfully submitted on nodes, and the following error will occur when node 6 is added (I am using openmpi4.1.0)


This is the script I used to submit the task:
Beta Was this translation helpful? Give feedback.
All reactions