You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I noticed that the details such as gradient accumulation in the base accelerator and deepspeed accelerator for preference model training seem to be different, causing errors. Just wondering but what is the actual gradient accumulation and batch size used during training?
Thanks in advance