I have a single A10 for training the model, from the start of the training, it says: Found NaN, decreased lg_loss_scale to 19.0 is that normal?