how to change batch size during training? #2502

tnq177 · 2022-11-12T17:54:08Z

tnq177
Nov 12, 2022

I am thinking of trying something like curriculum training which maybe during the first few epochs train on short sequences and later epoch on longer sequences. If I reduce half the max sequence length without double the batch size, it'll be a waste of computation. How do I change batch size during training please? Thanks.

conglongli · 2022-11-18T18:41:22Z

conglongli
Nov 18, 2022

Hi @tnq177, deepspeed has an API that is similar to what you want but would require some tweaks: there is a function called set_train_batch_size at https://github.com/microsoft/DeepSpeed/blob/7ce371b139521b1ebbf052f0496b1a16397c1d19/deepspeed/runtime/engine.py#L422, which change the global batch size during training (an example is how it's called here https://github.com/microsoft/Megatron-DeepSpeed/blob/c685fb5e4973864ab0d0ad30e55edc014e151ca5/megatron/training.py#L939), BUT it does so by changing the gradient accumulation step (gas), not the micro batch size (actual batch size on each GPU). Since you want to improve computation efficiency, you would want to do the opposite: keep gas the same, and change micro batch size. We currently don't have a use case at our hand, so it'd be great if you could first try to manually add an deepspeed function and see if it works, and if so we'd really appreciate a PR contribution from you :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

how to change batch size during training? #2502

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

how to change batch size during training? #2502

Uh oh!

tnq177 Nov 12, 2022

Replies: 1 comment

Uh oh!

conglongli Nov 18, 2022

tnq177
Nov 12, 2022

conglongli
Nov 18, 2022