Skip to content
Discussion options

You must be logged in to vote

You can print out the optim config with print(model.cfg.optim) then change warmup_steps to a 0 or a small number like 5000. Probably the warmup steps is too much. Alzi check that the scheduler is actually CosineAnnealing - then 0.001 is used.

If it's Noam, then optim.lr acts as a multiplier - so your actual LR is being multiplied by 0.001

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Khimer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants