You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/FAQ.md
+23-20Lines changed: 23 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,11 @@
4
4
5
5
### `Δx`, `∂x`, `dx`
6
6
ChainRules uses these perhaps atypically.
7
-
As a notation that is the same across propagators, regardless of direction (incontrast see `ẋ` and `x̄` below).
7
+
As a notation that is the same across propagators, regardless of direction (in contrast see `ẋ` and `x̄` below).
8
8
9
-
-`Δx` is the input to a propagator, (i.e a _seed_ for a _pullback_; or a _perturbation_ for a _pushforward_)
10
-
-`∂x` is the output of a propagator
11
-
-`dx` could be either `input` or `output`
9
+
-`Δx` is the input to a propagator, (i.e a _seed_ for a _pullback_; or a _perturbation_ for a _pushforward_).
10
+
-`∂x` is the output of a propagator.
11
+
-`dx` could be either `input` or `output`.
12
12
13
13
14
14
### dots and bars: ``\dot{y} = \dfrac{∂y}{∂x} = \overline{x}``
@@ -28,16 +28,19 @@ Why not just return the pushforward/pullback, and let the user call `f(x)` to ge
28
28
There are three reasons the rules also calculate the `f(x)`.
29
29
1. For some rules an alternative way of calculating `f(x)` can give the same answer while also generating intermediate values that can be used in the calculations required to propagate the derivative.
30
30
2. For many `rrule`s the output value is used in the definition of the pullback. For example `tan`, `sigmoid` etc.
31
-
3. For some `frule`s there exists a single, non-separable operation that will compute both derivative and primal result. For example many of the methods for [differential equation sensitivity analysis](https://docs.juliadiffeq.org/stable/analysis/sensitivity/#sensitivity-1).
31
+
3. For some `frule`s there exists a single, non-separable operation that will compute both derivative and primal result. For example, this is the case for many of the methods for [differential equation sensitivity analysis](https://docs.juliadiffeq.org/stable/analysis/sensitivity/#sensitivity-1).
32
32
33
33
For more information and examples see the [design notes on changing the primal](@ref change_primal).
34
34
35
35
## Where are the derivatives for keyword arguments?
36
-
_pullbacks_ do not return a sensitivity for keyword arguments;
37
-
similarly _pushfowards_ do not accept a perturbation for keyword arguments.
36
+
37
+
_Pullbacks_ do not return a sensitivity for keyword arguments;
38
+
similarly, _pushforwards_ do not accept a perturbation for keyword arguments.
38
39
This is because in practice functions are very rarely differentiable with respect to keyword arguments.
39
-
As a rule keyword arguments tend to control side-effects, like logging verbosity,
40
-
or to be functionality changing to perform a different operation, e.g. `dims=3`, and thus not differentiable.
40
+
41
+
As a rule, keyword arguments tend to control side-effects, like logging verbosity,
42
+
or to be functionality-changing to perform a different operation, e.g. `dims=3`, and thus not differentiable.
43
+
41
44
To the best of our knowledge no Julia AD system, with support for the definition of custom primitives, supports differentiating with respect to keyword arguments.
42
45
At some point in the future ChainRules may support these. Maybe.
43
46
@@ -47,13 +50,13 @@ At some point in the future ChainRules may support these. Maybe.
47
50
Odds are if you write a rule that returns the wrong one everything will just work fine.
48
51
We provide both to allow for clearer writing of rules, and easier debugging.
49
52
50
-
`ZeroTangent()` represents the fact that if one perturbs (adds a small change to) the matching primal there will be no change in the behaviour of the primal function.
51
-
For example in `fst(x,y) = x`, then the derivative of `fst` with respect to `y` is `ZeroTangent()`.
52
-
`fst(10, 5) == 10` and if we add `0.1` to `5` we still get `fst(10, 5.1)=10`.
53
+
`ZeroTangent()` represents the fact that if one perturbs (adds a small change to) the matching primal, there will be no change in the behaviour of the primal function.
54
+
For example, in `fst(x,y) = x`, the derivative of `fst` with respect to `y` is `ZeroTangent()`.
55
+
`fst(10, 5) == 10` and if we add `0.1` to `5` we still get `fst(10, 5.1) == 10`.
53
56
54
57
`NoTangent()` represents the fact that if one perturbs the matching primal, the primal function will now error.
55
-
For example in `access(xs, n) = xs[n]` then the derivative of `access` with respect to `n` is `NoTangent()`.
56
-
`access([10, 20, 30], 2) = 20`, but if we add `0.1` to `2` we get `access([10, 20, 30], 2.1)` which errors as indexing can't be applied at fractional indexes.
58
+
For example, in `access(xs, n) = xs[n]`, the derivative of `access` with respect to `n` is `NoTangent()`.
59
+
`access([10, 20, 30], 2) == 20`, but if we add `0.1` to `2` we get `access([10, 20, 30], 2.1)` which errors as indexing can't be applied at fractional indexes.
57
60
58
61
59
62
## When to use ChainRules vs ChainRulesCore?
@@ -62,24 +65,24 @@ For example in `access(xs, n) = xs[n]` then the derivative of `access` with resp
62
65
It has almost no dependencies of its own.
63
66
If you only want to define rules, not use them, then you probably only want to load ChainRulesCore.jl.
64
67
65
-
[ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl) provides the full functionality for AD systems, in particular it has all the rules for Base Julia and the standard libraries.
68
+
[ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl) provides the full functionality for AD systems. In particular, it has all the rules for Base Julia and the standard libraries.
66
69
It is thus a much heavier package to load.
67
70
AD systems making use of `frule`s and `rrule`s should load ChainRules.jl.
68
71
69
72
## Where should I put my rules?
70
73
71
74
We recommend adding custom rules to your own packages with [ChainRulesCore.jl](https://github.com/JuliaDiff/ChainRulesCore.jl).
72
-
It is good to have them in the same pacakge that defines the original function.
75
+
It is good to have them in the same package that defines the original function.
73
76
This avoids type-piracy, and makes it easy to keep in-sync.
74
-
ChainRulesCore is a very lightweight dependency.
77
+
ChainRulesCore is a very light-weight dependency.
75
78
76
79
## How do I test my rules?
77
80
78
81
You can use [ChainRulesTestUtils.jl](https://github.com/JuliaDiff/ChainRulesTestUtils.jl) to test your custom rules.
79
82
ChainRulesTestUtils.jl has some dependencies, so it is a separate package from ChainRulesCore.jl.
80
83
This means your package can depend on the light-weight ChainRulesCore.jl, and make ChainRulesTestUtils.jl a test-only dependency.
81
84
82
-
Remember to read the section on [On writing good `rrule` / `frule` methods](@ref).
85
+
Remember to read the section [On writing good `rrule` / `frule` methods](@ref).
83
86
84
87
## Where can I learn more about AD ?
85
88
There are not so many truly excellent learning resources for autodiff out there in the world, which is a bit sad.
@@ -103,9 +106,9 @@ It also covers forward-mode though (by its own admission) not as well, it never
103
106
## Is removing a thunk a breaking change?
104
107
Removing thunks is not considered a breaking change.
105
108
This is because (in principle) removing them changes the implementation of the values
106
-
returned by an rrule, not the value that they represent.
109
+
returned by an `rrule`, not the value that they represent.
107
110
This is morally the same as similar issues [discussed in ColPrac](https://github.com/SciML/ColPrac#changes-that-are-not-considered-breaking), such as details of floating point arithmetic changing.
108
111
109
-
On a practical level, it's important that this is the case because thunks a bit of a hack,
112
+
On a practical level, it's important that this is the case because thunks are a bit of a hack,
110
113
and over time it is hoped that the need for them will reduce, as they increase
111
114
code-complexity and place additional stress on the compiler.
0 commit comments