Skip to content

Commit 4b60ce4

Browse files
committed
tweak text
1 parent 4764879 commit 4b60ce4

File tree

1 file changed

+41
-48
lines changed

1 file changed

+41
-48
lines changed

docs/src/tutorials/gradient_zoo.md

Lines changed: 41 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ also known as reverse-mode automatic differentiation.
55
Given a model, some data, and a loss function, this answers the question
66
"what direction, in the space of the model's parameters, reduces the loss fastest?"
77

8+
### `gradient(f, x)` interface
9+
810
Julia's ecosystem has many versions of `gradient(f, x)`, which evaluates `y = f(x)` then retuns `∂y_∂x`. The details of how they do this vary, but the interfece is similar. An incomplete list is (alphabetically):
911

1012
```julia
@@ -37,9 +39,24 @@ julia> Zygote.withgradient(x -> sum(sqrt, x), [1 4 16.])
3739
(val = 7.0, grad = ([0.5 0.25 0.125],))
3840
```
3941

40-
These all show the same `∂y_∂x` with respect to `x::Vector`. Sometimes, the result is within a tuple or a NamedTuple.
42+
These all show the same `∂y_∂x` with respect to `x::Vector`. Sometimes, the result is within a tuple or a NamedTuple, containing `y` as well as the gradient.
43+
44+
Note that in all cases, only code executed within the call to `gradient` is differentiated. Calculating the objective function before calling `gradient` will not work, as all information about the steps from `x` to `y` has been lost. For example:
45+
46+
```julia
47+
julia> y = sum(sqrt, x) # calculate the forward pass alone
48+
7.0
49+
50+
julia> y isa Float64 # has forgotten about sqrt and sum
51+
true
52+
53+
julia> Zygote.gradient(x -> y, x) # this cannot work, and gives zero
54+
(nothing,)
55+
```
56+
57+
### `gradient(f, model)` for Flux models
4158

42-
However, the parameters of a Flux model are encapsulated inside the various layers. The model is a set of nested structures. And the gradients `∂loss_∂model` which Flux uses are similarly nested objects.
59+
However, the parameters of a Flux model are encapsulated inside the various layers. The model is a set of nested structures, and the gradients `∂loss_∂model` which Flux uses are similarly nested objects.
4360
For example, let's set up a simple model & loss:
4461

4562
```julia
@@ -83,66 +100,45 @@ Chain(
83100
While the type returned for `∂loss_∂model` varies, they all have the same nested structure, matching that of the model. This is all that Flux needs.
84101

85102
```julia
86-
julia> grads_z[1].layers[1].weight
103+
julia> grads_z[1].layers[1].weight # get the weight matrix
87104
2×3 Matrix{Float64}:
88105
-0.181715 0.0 0.0
89106
0.181715 0.0 0.0
90107

91-
julia> grad_e.layers[1].weight
108+
julia> grad_e.layers[1].weight # get the corresponding gradient matrix
92109
2×3 Matrix{Float64}:
93110
-0.181715 0.0 0.0
94111
0.181715 0.0 0.0
95112
```
96113

97114
Here's Flux updating the model using each gradient:
98-
<!--- perhaps we should trim this?? --->
99115

100116
```julia
101117
julia> opt = Flux.setup(Descent(1/3), model)
102118
(layers = ((weight = Leaf(Descent(0.333333), nothing),), ()),)
103119

104-
julia> Flux.update!(opt, deepcopy(model), grads_t[1])[2][1].weight
105-
2×3 Matrix{Float64}:
106-
1.06057 3.0 5.0
107-
1.93943 4.0 6.0
120+
julia> model_z = deepcopy(model);
108121

109-
julia> Flux.update!(opt, deepcopy(model), grads_z[1])[2][1].weight
110-
2×3 Matrix{Float64}:
111-
1.06057 3.0 5.0
112-
1.93943 4.0 6.0
122+
julia> Flux.update!(opt, model_z, grads_z[1]);
113123

114-
julia> Flux.update!(opt, deepcopy(model), grads_d[1])[2][1].weight
124+
julia> model_z.layers[1].weight # updated weight matrix
115125
2×3 Matrix{Float64}:
116126
1.06057 3.0 5.0
117127
1.93943 4.0 6.0
118128

119-
julia> Flux.update!(opt, deepcopy(model), grad_e)[2][1].weight
129+
julia> model_e = deepcopy(model);
130+
131+
julia> Flux.update!(opt, model_e, grad_e)[2][1].weight # same update
120132
2×3 Matrix{Float64}:
121133
1.06057 3.0 5.0
122134
1.93943 4.0 6.0
123135
```
124136

125137
In this case they are all identical, but there are some caveats, explored below.
126138

127-
128-
Aside, Tapir seems not to work just yet?
129-
```julia
130-
julia> Tapir_grad(f, xs...) = Tapir.value_and_pullback!!(Tapir.build_rrule(f, xs...), 1.0, f, xs...);
131-
132-
julia> _, grad_p = Tapir_grad(loss, model)
133-
(0.6067761335170363, (NoTangent(), Tangent{@NamedTuple{layers::Tuple{Tangent{@NamedTuple{weight::Matrix{Float64}}}, NoTangent}}}((layers = (Tangent{@NamedTuple{weight::Matrix{Float64}}}((weight = [0.0 0.0 0.0; 0.0 0.0 0.0],)), NoTangent()),))))
134-
135-
julia> grad_p.fields.layers[1].fields.weight
136-
2×3 Matrix{Float64}:
137-
0.0 0.0 0.0
138-
0.0 0.0 0.0
139-
```
140-
141-
<!--- I made an issue... perhaps fixed now?? --->
142-
143139
<hr/>
144140

145-
## Packages
141+
## Automatic Differentiation Packages
146142

147143
Both Zygote and Tracker were written for Flux, and at present, Flux loads Zygote and exports `Zygote.gradient`, and calls this within `Flux.train!`. But apart from that, there is very little coupling between Flux and the automatic differentiation package.
148144

@@ -163,24 +159,21 @@ Source-to-source, within Julia.
163159
* Returns nested NamedTuples and Tuples, and uses `nothing` to mean zero.
164160

165161

166-
### Zygote, implicit mode
167-
168-
Flux's default used to be work like this, instead of using deeply nested trees for gradients as above:
162+
!!! compat "Deprecated: Zygote's implicit mode"
163+
Flux's default used to be work like this, instead of using deeply nested trees for gradients as above:
164+
```julia
165+
julia> ps = Flux.params(model) # dictionary-like object, with global `objectid` refs
166+
Params([Float32[1.0 3.0 5.0; 2.0 4.0 6.0]])
169167

170-
```julia
171-
julia> ps = Flux.params(model)
172-
Params([Float32[1.0 3.0 5.0; 2.0 4.0 6.0]])
173-
174-
julia> val, grad = Zygote.withgradient(() -> loss(model), ps)
175-
(val = 0.6067761f0, grad = Grads(...))
176-
177-
julia> grad[model.layers[1].weight] # dictionary, indexed by parameter arrays
178-
2×3 Matrix{Float32}:
179-
0.0 0.0 -0.181715
180-
0.0 0.0 0.181715
181-
```
168+
julia> val, grad = Zygote.withgradient(() -> loss(model), ps)
169+
(val = 0.6067761f0, grad = Grads(...))
182170

183-
The code inside Zygote is much the same -- do not expect large changes in speed, nor any changes in what works and what does not.
171+
julia> grad[model.layers[1].weight] # another dictionary, indexed by parameter arrays
172+
2×3 Matrix{Float32}:
173+
0.0 0.0 -0.181715
174+
0.0 0.0 0.181715
175+
```
176+
The code inside Zygote is much the same -- do not expect large changes in speed, nor any changes in what works and what does not.
184177

185178
### [Tracker.jl](https://github.com/FluxML/Tracker.jl)
186179

0 commit comments

Comments
 (0)