You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`batch_size`: batch size per GPU for fine-tuning.
501
-
-`accum_iter`: gradient accumulation steps. The effective batch size is batch_size*accum_iter*num_GPU. <br>
500
+
-`batch_size`: batch size per GPU for fine-tuning.
501
+
-`accum_iter`: gradient accumulation steps. The effective batch size is batch_size*accum_iter*num_GPU. We recommend at least 256 for stable and reliable training. <br>
502
502
If you have memory constraints, you can increase --accum_iter and reduce the --batch_size to trade off memory for computation.
503
503
-`epochs`: number of epochs for fine-tuning. Default: 50.
504
504
The performance will increase with more epochs, but 50 should be enough to have very good performances.
@@ -542,7 +542,7 @@ Please make sure you include at least **batch_size*num_gpu** examples in the tra
0 commit comments