You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Clarify order of @thunk and ProjectTo in docs
* Apply suggestions from code review
Co-authored-by: Miha Zgubic <mzgubic@users.noreply.github.com>
* incorporate @oxinabox's draft
* move paragraph to right (?) place
* add comment on test_[fr]rule of @not_implemented differentials
* typo fix
* improve example
Co-authored-by: Miha Zgubic <mzgubic@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/src/writing_good_rules.md
+28-14Lines changed: 28 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,9 @@ julia> rrule(foo, 2)
28
28
==#
29
29
```
30
30
31
+
While this is more verbose, it ensures that if an error is thrown during the `pullback` the [`gensym`](https://docs.julialang.org/en/v1/base/base/#Base.gensym) name of the local function will include the name you gave it.
32
+
This makes it a lot simpler to debug from the stacktrace.
33
+
31
34
## Use `ZeroTangent()` as the return value
32
35
33
36
The `ZeroTangent()` object exists as an alternative to directly returning `0` or `zeros(n)`.
@@ -102,6 +105,18 @@ function rrule(::typeof(*), A::AbstractMatrix, B::AbstractMatrix)
102
105
end
103
106
```
104
107
108
+
!!! note "It is often good to `@thunk` your projections"
109
+
The above example is potentially a good place for using a [`@thunk`](@ref).
110
+
This is not required, but can in some cases be more computationally efficient, see [Use `Thunk`s appropriately](@ref).
111
+
When combining thunks and projections, `@thunk()` must be the outermost call.
112
+
113
+
A more optimized implementation of the matrix-matrix multiplication example would have
Do not use `@not_implemented` if the differential does not exist mathematically (use `NoTangent()` instead).
264
279
265
-
While this is more verbose, it ensures that if an error is thrown during the `pullback` the [`gensym`](https://docs.julialang.org/en/v1/base/base/#Base.gensym) name of the local function will include the name you gave it.
266
-
This makes it a lot simpler to debug from the stacktrace.
280
+
Note: [ChainRulesTestUtils.jl](https://github.com/JuliaDiff/ChainRulesTestUtils.jl) marks `@not_implemented` differentials as "test broken".
267
281
268
282
## Use rule definition tools
269
283
@@ -387,7 +401,7 @@ Take a look at the documentation or the existing [ChainRules.jl](https://github.
387
401
388
402
!!! warning
389
403
Don't use analytical derivations for derivatives in the tests.
390
-
Those are what you use to define the rules, and so can not be confidently used in the test.
404
+
Those are what you use to define the rules, and so cannot be confidently used in the test.
391
405
If you misread/misunderstood them, then your tests/implementation will have the same mistake.
392
406
Use finite differencing methods instead, as they are based on the primal computation.
393
407
@@ -401,10 +415,10 @@ In principle, a perfect AD system only needs rules for basic operations and can
401
415
In practice, performance needs to be considered as well.
402
416
403
417
Some functions use `ccall` internally, for example [`^`](https://github.com/JuliaLang/julia/blob/v1.5.3/base/math.jl#L886).
404
-
These functions can not be differentiated through by AD systems, and need custom rules.
418
+
These functions cannot be differentiated through by AD systems, and need custom rules.
405
419
406
420
Other functions can in principle be differentiated through by an AD system, but there exists a mathematical insight that can dramatically improve the computation of the derivative.
407
-
An example is numerical integration, where writing a rule removes the need to perform AD through numerical integration.
421
+
An example is numerical integration, where writing a rule implementing the [fundamental theorem of calculus](https://en.wikipedia.org/wiki/Fundamental_theorem_of_calculus) removes the need to perform AD through numerical integration.
408
422
409
423
Furthermore, AD systems make different trade-offs in performance due to their design.
410
424
This means that a certain rule will help one AD system, but not improve (and also not harm) another.
@@ -416,7 +430,7 @@ This may be resolved in the future by [allowing AD systems to opt-in or opt-out
416
430
417
431
### Patterns that need rules in [Zygote.jl](https://github.com/FluxML/Zygote.jl)
418
432
419
-
There are a few classes of functions that Zygote can not differentiate through.
433
+
There are a few classes of functions that Zygote cannot differentiate through.
420
434
Custom rules will need to be written for these to make AD work.
421
435
422
436
Other patterns can be AD'ed through, but the backward pass performance can be greatly improved by writing a rule.
0 commit comments