Replies: 1 comment
-
|
Hi @tnq177, deepspeed has an API that is similar to what you want but would require some tweaks: there is a function called set_train_batch_size at https://github.com/microsoft/DeepSpeed/blob/7ce371b139521b1ebbf052f0496b1a16397c1d19/deepspeed/runtime/engine.py#L422, which change the global batch size during training (an example is how it's called here https://github.com/microsoft/Megatron-DeepSpeed/blob/c685fb5e4973864ab0d0ad30e55edc014e151ca5/megatron/training.py#L939), BUT it does so by changing the gradient accumulation step (gas), not the micro batch size (actual batch size on each GPU). Since you want to improve computation efficiency, you would want to do the opposite: keep gas the same, and change micro batch size. We currently don't have a use case at our hand, so it'd be great if you could first try to manually add an deepspeed function and see if it works, and if so we'd really appreciate a PR contribution from you :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am thinking of trying something like curriculum training which maybe during the first few epochs train on short sequences and later epoch on longer sequences. If I reduce half the max sequence length without double the batch size, it'll be a waste of computation. How do I change batch size during training please? Thanks.
Beta Was this translation helpful? Give feedback.
All reactions