Train model in slurm cluster. #1861
Unanswered
mrbeann
asked this question in
Community | Q&A
Replies: 1 comment
-
Hi @mrbeann , I don't think torchrun can work well with slurm. To deal with this kind of issues, we provide our own launcher for the slurm platform. Please refer to https://colossalai.org/docs/basics/launch_colossalai#launch-with-slurm and have a try. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I try to use colossalAI in a cluster managed by slurm. I first open a shell through a command like
srun --pty /bin/bash
. Then I try the starter example. However it raises the following error,Is there any idea about this?
Beta Was this translation helpful? Give feedback.
All reactions