Skip to content
Discussion options

You must be logged in to vote

First one, I have let the model to train for 10 epochs and the loss dropped from 463 to 104. Then using the experiment manager, I resumed the training for another 10 epochs, but the loss nearly didn't change but even became worse (from 106 to 109).

I.e. you set your model to train for 10 epochs. Let that one finish, then use exp manager with that for further 10 epochs?

This will not work. Take a look at the LR of the second run - it should be close to 0 (or whatever min_lr you set).

The resume functionality is meant to be used from the beginning of the training run - and is only useful if you stop in the middle. It cannot be used to continue an already finished run unless you were super…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@Th3Moody
Comment options

@titu1994
Comment options

@Th3Moody
Comment options

Answer selected by Th3Moody
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants