Replies: 1 comment
-
If the very first gradient evaluation in forward-mode hits a near-zero denominator, you get a NaN right away. Can you use a small safe divisor offset (ϵ) or clamping? Since extremely small value can cause this issue |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Under what circumstances is it possible to have a forward mode nan gradient but not a reverse mode nan gradient? The context for this issue is here.
Beta Was this translation helpful? Give feedback.
All reactions