使用colossalai原生的fp16会在clip_grad_norm_fp32函数中卡死 #2253
Unanswered
yhcc
asked this question in
Community | Q&A
Replies: 1 comment
-
应该是由于这个bug #2255 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
如下的函数在被FP16Optimizer调用时会导致卡死(会运行一段时间后才卡死,但每次卡死的迭代数量是一致的)。我使用了流水线并行+tensor并行,不知道有没有一些可能导致这个问题的猜测,这样我可以尝试debug一下。
ColossalAI/colossalai/utils/common.py
Line 279 in 8897b8f
Beta Was this translation helpful? Give feedback.
All reactions