How to adjust learning rate and decay steps for parallel training in Pytorch backend? #4716
Unanswered
kechangming
asked this question in
Q&A
Replies: 1 comment
-
the batch size in parallel training case indicates the batch size per each GPU. precisely reproduce the learning curve of single gpu is not trivial. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
when using PyTorch for parallel training (e.g., on 4 GPUs), how should the learning rate, decay_step, and step be set so that the trend of the loss function's decrease is similar to that of single-GPU training?
Beta Was this translation helpful? Give feedback.
All reactions