@@ -97,8 +97,8 @@ The simplest kind of optimisation using the gradient is termed *gradient descent
97
97
(or sometimes * stochastic gradient descent* when it is applied to individual examples
98
98
in a loop, not to the entire dataset at once).
99
99
100
- This needs a * learning rate* which is a small number describing how fast to walk downhill,
101
- usually written as the Greek letter "eta", ` η ` .
100
+ Gradient descent needs a * learning rate* which is a small number describing how fast to walk downhill,
101
+ usually written as the Greek letter "eta", ` η ` . This is what it does:
102
102
103
103
``` julia
104
104
η = 0.01 # learning rate
@@ -110,16 +110,14 @@ fmap(model, grads[1]) do p, g
110
110
end
111
111
```
112
112
113
- This is wrapped up as a function [ ` update! ` ] (@ref Flux.Optimise.update!), which can be used as follows:
114
-
115
- ``` julia
116
- Flux. update! (Descent (0.01 ), model, grads[1 ])
117
- ```
113
+ This update of all parameters is wrapepd up as a function [ ` update! ` ] (@ref Flux.Optimise.update!)` (opt, model, grads[1]) ` .
118
114
119
115
There are many other optimisation rules, which adjust the step size and direction.
120
- Most require some memory of the gradients from earlier steps. The function [ ` setup ` ] (@ref Flux.Train.setup)
121
- creates the necessary storage for this, for a particular model. This should be done
122
- once, before training, and looks like this:
116
+ Most require some memory of the gradients from earlier steps, rather than always
117
+ walking straight downhill. The function [ ` setup ` ] (@ref Flux.Train.setup) creates the
118
+ necessary storage for this, for a particular model.
119
+ It should be called once, before training, and returns a tree-like object which is the
120
+ first argument of ` update! ` . Like this:
123
121
124
122
``` julia
125
123
# Initialise momentum
@@ -128,7 +126,7 @@ opt = Flux.setup(Adam(0.001), model)
128
126
for data in train_set
129
127
...
130
128
131
- #
129
+ # Update both model parameters and optimiser state:
132
130
Flux. update! (opt, model, grads[1 ])
133
131
end
134
132
```
@@ -138,7 +136,7 @@ These are listed on the [optimisers](@ref man-optimisers) page.
138
136
139
137
140
138
!!! note "Implicit-style optimiser state"
141
- This ` setep ` makes another tree-like structure. Old versions of Flux did not do this,
139
+ This ` setup ` makes another tree-like structure. Old versions of Flux did not do this,
142
140
and instead stored a dictionary-like structure within the optimiser ` Adam(0.001) ` .
143
141
This was initialised on first use of the version of ` update! ` for "implicit" parameters.
144
142
@@ -266,12 +264,14 @@ for epoch in 1:100
266
264
end
267
265
```
268
266
269
-
270
267
## Implicit vs Explicit
271
268
272
269
Flux used to handle gradients, training, and optimisation rules quite differently.
273
270
The new style described above is called "explicit" by Zygote, and the old style "implicit".
274
271
Flux 0.13 is the transitional version which supports both.
275
272
276
- For full details on the implicit style, see [ Flux 0.13.6 documentation] ( https://fluxml.ai/Flux.jl/v0.13.6/training/training/ ) .
273
+ The blue boxes above describe the changes.
274
+ For more details on training in the implicit style, see [ Flux 0.13.6 documentation] ( https://fluxml.ai/Flux.jl/v0.13.6/training/training/ ) .
275
+
276
+ For details about the two gradient modes, see [ Zygote's documentation] ( https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1 ) .
277
277
0 commit comments