Skip to content

ddp训练时,一次forward,两次backbward时,为什么第1次backbward时获得的梯度是多卡的平均梯度,第2次backbward时获得的梯度没做多卡的平均操作 #9802

Unanswered
gftd asked this question in General
Discussion options

You must be logged in to vote

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant