You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now, as discussed in the introduction the AD system would on it's own choose either 1 or -1, depending on implementation.
48
+
Now, as discussed in the introduction, the AD system would on it's own choose either 1 or -1, depending on implementation.
49
49
50
50
We however have a potentially much nicer answer available to use: 0.
51
51
52
52
This has a number of advantages.
53
53
- It follows the rule that derivatives are zero at local minima (and maxima).
54
-
- If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee.
54
+
- If you leave a gradient descent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee.
55
55
56
56
Further:
57
57
- It is a perfectly nice member of the [subderivative](https://en.wikipedia.org/wiki/Subderivative).
@@ -61,9 +61,9 @@ Further:
61
61
plot(x-> x < 0 ? x : 5x)
62
62
```
63
63
64
-
Here was have 3 main options, all are good.
64
+
Here we have 3 main options, all are good.
65
65
66
-
We could say there derivative at 0 is:
66
+
We could say the derivative at 0 is:
67
67
- 1: which agrees with backwards finite differencing
68
68
- 5: which agrees with forwards finite differencing
69
69
- 3: which is the mean of `[1, 5]`, and agrees with central finite differencing
@@ -82,9 +82,9 @@ plot(ceil)
82
82
Here it is most useful to say the derivative is zero everywhere.
83
83
The limits are zero from both sides.
84
84
85
-
The other option for `x->ceil(x)` would be relax the problem into `x->x`, and thus say it is 1 everywhere
86
-
But that it too weird, if the use wanted a relaxation of the problem then they would provide one.
87
-
We can not be imposing that relaxation on to `ceil` for everyone is not reasonable.
85
+
The other option for `x->ceil(x)` would be to relax the problem into `x->x`, and thus say it is 1 everywhere.
86
+
But that it too weird, if the user wanted a relaxation of the problem then they would provide one.
87
+
We can not be imposing that relaxation on to `ceil`, as it is not reasonable for everyone.
88
88
89
89
### Not defined on one-side
90
90
```@example nondiff
@@ -122,17 +122,17 @@ But this is more or less the same as choosing some large value -- in this case a
122
122
plot(x-> sign(x) * cbrt(x))
123
123
```
124
124
125
-
In this example, the primal is defined and finite, so we would like a derivative to defined.
126
-
We are back in the case of a local minimal like we were for `abs`.
125
+
In this example, the primal is defined and finite, so we would like a derivative to be defined.
126
+
We are back in the case of a local minimum like we were for `abs`.
127
127
We can make most of the same arguments as we made there to justify saying the derivative is zero.
128
128
129
129
## Conclusion
130
130
131
131
From the case studies a few general rules can be seen for how to choose a value that is _useful_.
132
132
These rough rules are:
133
-
- Say the derivative is 0 at local optima
134
-
- If the derivative from one side is defined and the other isn't, say it is the derivative taken from defined side.
135
-
- If the derivative from one side is finite and the other isn't, say it is the derivative taken from finite side.
136
-
- When derivative from each side is not equal, strongly consider reporting the average
133
+
- Say the derivative is 0 at local optima.
134
+
- If the derivative from one side is defined and the other isn't, say it is the derivative taken from the defined side.
135
+
- If the derivative from one side is finite and the other isn't, say it is the derivative taken from the finite side.
136
+
- When derivative from each side is not equal, strongly consider reporting the average.
137
137
138
138
Our goal as always, is to get a pragmatically useful result for everyone, which must by necessity also avoid a pathological result for anyone.
0 commit comments