How to set up learning rate starting being decayed at the beginning of a training process #4875

nghiahuynh-ai · 2022-09-05T04:21:13Z

nghiahuynh-ai
Sep 5, 2022

I have an issue that I want to train a model (Conformer-Transducer) in 2 stages. First stage, the learning rate raises only (just warm up). Second stage, I want the learning rate to be decayed. I used load_from_checkpoint method to load the previous checkpoint and set up all parts needed. I found that the learning rate have been warm up again. So, how to set up the learning rate starting being decayed at the beginning of second stage.
Thank for regard.

SeanNaren · 2022-09-05T09:15:27Z

SeanNaren
Sep 5, 2022
Maintainer

Hi @nghiahuynh-ai just to clear up you want to continue the LR schedule when you restart training? it seems you're using model.load_from_checkpoint and resuming training? Would you be able to instead do this in your own script:

trainer.fit(ckpt_path<my_checkpoint_path>, ...)

This will ensure the step etc are reloaded.

Any chance you can use the example scripts provided, such as this? This makes it even easier to enabling checkpoint resume from the cmdline, just passing these flags:

exp_manager.resume_if_exists=true exp_manager.resume_ignore_no_checkpoint=true exp_manager.exp_dir=nemo_checkpoints/ exp_manager.checkpoint_callback_params.save_top_k=1

With these flags, NeMo will automatically resume if a checkpoint is found in the directory, handling all the state management for you!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to set up learning rate starting being decayed at the beginning of a training process #4875

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to set up learning rate starting being decayed at the beginning of a training process #4875

Uh oh!

nghiahuynh-ai Sep 5, 2022

Replies: 1 comment

Uh oh!

SeanNaren Sep 5, 2022 Maintainer

nghiahuynh-ai
Sep 5, 2022

SeanNaren
Sep 5, 2022
Maintainer