-
Hello! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
You can print out the optim config with print(model.cfg.optim) then change warmup_steps to a 0 or a small number like 5000. Probably the warmup steps is too much. Alzi check that the scheduler is actually CosineAnnealing - then 0.001 is used. If it's Noam, then optim.lr acts as a multiplier - so your actual LR is being multiplied by 0.001 |
Beta Was this translation helpful? Give feedback.
You can print out the optim config with print(model.cfg.optim) then change warmup_steps to a 0 or a small number like 5000. Probably the warmup steps is too much. Alzi check that the scheduler is actually CosineAnnealing - then 0.001 is used.
If it's Noam, then optim.lr acts as a multiplier - so your actual LR is being multiplied by 0.001