You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reduced the epsilon for scale calculation to 1e-20 (#1628)
Summary:
Pull Request resolved: #1628
previously the scale is calcualted by max_int8/(eps + max_val), while eps = 1e-8; this is okay because the forward tensor needs to be larger than that to make a difference;
with fp8, the gradients are much smaller, eps =1e-8 will cause the tensor not scaling up to cover the entire range of fp8, thus we change it to 1e-20
Reviewed By: brad-mengchi
Differential Revision: D43665395
fbshipit-source-id: 0214e22af7739ef5cf7561b15e02830c35320402
0 commit comments