-
Hi, I've tried to train a conformer model from scratch & finetuning using the pretrained models from NGC. And I seem to be running into Especially for finetuning, the loss suddenly becomes Using the same data and training a medium conformer has worked for me, but not on the first try. I initially encountered I tried looking at the intermediate outputs, it seems like the input embeddings all look normal, but at some point on one GPU the encoder output starts to become So from my understanding, it doesn't look like faulty input because in the case of training from scratch it happens after a few epochs. It doesn't look like exploding gradient, I don't see the loss gradually diverging, it happens suddenly. I wonder if it has to do with learning rate or learning rate warm up because if it fails at the very beginning the lr is really really small. BTW I'm training on 4 A100s, with a private data set. Has anyone encountered this before? Anyone can help? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 8 replies
-
Is it Conformer-CTC or Conformer-Transducer? |
Beta Was this translation helpful? Give feedback.
Is it Conformer-CTC or Conformer-Transducer?
We have not tested Conformer fully with mixed precision training. Sometimes loss explodes with mixed precision especially with low weight decay. Does it also happen with fp32?
What is the weight decay and warmup you use?
What is your lr when that happens?
Have you tried gradient clipping?
By "2-20 iterations" do you mean epochs?