-
Notifications
You must be signed in to change notification settings - Fork 49
Description
I wonder why "total_grad_norm" always "NaN" during training?
e.g.
20231023_095643|core.utils.my_writer@171: eta: 15:06:27 iter: 50699/102400[49.5%] time: 1.0520 lr: 1.0000e-04 max_mem: 3611M total_grad_norm: N/A total_loss: 1.3978e+01 (1.5329e+01) loss_coor_x: 1.5648e-02 (3.0480e-02) loss_coor_y: 1.4901e-02 (3.1228e-02) loss_coor_z: 1.5298e-02 (2.9955e-02) loss_mask: 1.5607e-02 (3.0423e-02) loss_region: 1.3853e+01 (1.5046e+01) loss_PM_R: 2.2228e-02 (5.8838e-02) loss_centroid: 7.3131e-03 (1.6075e-02) loss_z: 2.5838e-02 (8.6372e-02)
20231023_095829|core.utils.my_writer@171: eta: 15:12:08 iter: 50799/102400[49.6%] time: 1.0606 lr: 1.0000e-04 max_mem: 3611M total_grad_norm: N/A total_loss: 1.4628e+01 (1.5328e+01) loss_coor_x: 1.5359e-02 (3.0450e-02) loss_coor_y: 1.5762e-02 (3.1197e-02) loss_coor_z: 1.5910e-02 (2.9926e-02) loss_mask: 1.6061e-02 (3.0395e-02) loss_region: 1.4515e+01 (1.5045e+01) loss_PM_R: 2.3028e-02 (5.8766e-02) loss_centroid: 7.7087e-03 (1.6058e-02) loss_z: 2.9443e-02 (8.6259e-02)
20231023_100017|core.utils.my_writer@171: eta: 15:17:22 iter: 50899/102400[49.7%] time: 1.0688 lr: 1.0000e-04 max_mem: 3611M total_grad_norm: N/A total_loss: 1.4508e+01 (1.5327e+01) loss_coor_x: 1.4479e-02 (3.0420e-02) loss_coor_y: 1.5535e-02 (3.1166e-02) loss_coor_z: 1.5549e-02 (2.9897e-02) loss_mask: 1.5610e-02 (3.0367e-02) loss_region: 1.4382e+01 (1.5044e+01) loss_PM_R: 2.1233e-02 (5.8696e-02) loss_centroid: 6.4163e-03 (1.6041e-02) loss_z: 2.5363e-02 (8.6141e-02)
20231023_100203|core.utils.my_writer@171: eta: 15:02:28 iter: 50999/102400[49.8%] time: 1.0534 lr: 1.0000e-04 max_mem: 3611M total_grad_norm: N/A total_loss: 1.5395e+01 (1.5326e+01) loss_coor_x: 1.5888e-02 (3.0391e-02) loss_coor_y: 1.5534e-02 (3.1136e-02) loss_coor_z: 1.5575e-02 (2.9869e-02) loss_mask: 1.6724e-02 (3.0340e-02) loss_region: 1.5275e+01 (1.5044e+01) loss_PM_R: 2.2431e-02 (5.8626e-02) loss_centroid: 7.5725e-03 (1.6025e-02) loss_z: 2.9323e-02 (8.6028e-02)
20231023_100349|core.utils.my_writer@171: eta: 15:04:02 iter: 51099/102400[49.9%] time: 1.0573 lr: 1.0000e-04 max_mem: 3611M total_grad_norm: N/A total_loss: 1.4805e+01 (1.5325e+01) loss_coor_x: 1.5165e-02 (3.0361e-02) loss_coor_y: 1.5487e-02 (3.1106e-02) loss_coor_z: 1.4915e-02 (2.9841e-02) loss_mask: 1.5903e-02 (3.0312e-02) loss_region: 1.4694e+01 (1.5043e+01) loss_PM_R: 2.3419e-02 (5.8555e-02) loss_centroid: 7.0675e-03 (1.6008e-02) loss_z: 2.3812e-02 (8.5911e-02)