You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Heads up for other users who want to resume training from a checkpoint: you will want to
de-indent DDP_main.py:80 so that all devices can load the checkpoint
load the optimizer and scheduler states on line DDP_main:146
set the index of the dataloader to the correct example before actually training
I'm not totally sure this solves everything like logging, but might work ok.
Note: There's also a separate issue that your checkpoints might get overwritten between epochs, so be sure you're loading the right thing and saving where you want.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Thanks for the code release!
Heads up for other users who want to resume training from a checkpoint: you will want to
I'm not totally sure this solves everything like logging, but might work ok.
Note: There's also a separate issue that your checkpoints might get overwritten between epochs, so be sure you're loading the right thing and saving where you want.
The text was updated successfully, but these errors were encountered: