How to adjust learning rate and decay steps for parallel training in Pytorch backend? #4716

kechangming · 2025-04-27T02:50:23Z

kechangming
Apr 27, 2025

when using PyTorch for parallel training (e.g., on 4 GPUs), how should the learning rate, decay_step, and step be set so that the trend of the loss function's decrease is similar to that of single-GPU training?

wanghan-iapcm · 2025-04-29T08:50:48Z

wanghan-iapcm
Apr 29, 2025
Maintainer

the batch size in parallel training case indicates the batch size per each GPU.
therefore, the effective batch size at each training step is batch_size * number_of_gpus.

precisely reproduce the learning curve of single gpu is not trivial.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to adjust learning rate and decay steps for parallel training in Pytorch backend? #4716

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to adjust learning rate and decay steps for parallel training in Pytorch backend? #4716

Uh oh!

kechangming Apr 27, 2025

Replies: 1 comment

Uh oh!

wanghan-iapcm Apr 29, 2025 Maintainer

kechangming
Apr 27, 2025

wanghan-iapcm
Apr 29, 2025
Maintainer