Skip to content
Discussion options

You must be logged in to vote

Is it Conformer-CTC or Conformer-Transducer?
We have not tested Conformer fully with mixed precision training. Sometimes loss explodes with mixed precision especially with low weight decay. Does it also happen with fp32?
What is the weight decay and warmup you use?
What is your lr when that happens?
Have you tried gradient clipping?
By "2-20 iterations" do you mean epochs?

Replies: 1 comment 8 replies

Comment options

You must be logged in to vote
8 replies
@whrichd
Comment options

@mehadi92
Comment options

@whrichd
Comment options

@VahidooX
Comment options

@mehadi92
Comment options

Answer selected by whrichd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants