Skip to content

Commit 3d7eb3f

Browse files
committed
tweaks
1 parent 6728d55 commit 3d7eb3f

File tree

3 files changed

+21
-19
lines changed

3 files changed

+21
-19
lines changed

docs/src/training/callbacks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Callback Helpers
1+
# [Callback Helpers](@id man-callback-helpers)
22

33
```@docs
44
Flux.throttle

docs/src/training/train_api.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ Optimisers.update!
2121

2222
Flux used to handle gradients, training, and optimisation rules quite differently.
2323
The new style described above is called "explicit" by Zygote, and the old style "implicit".
24-
Flux 0.13 is the transitional version which supports both.
24+
Flux 0.13 is the transitional version which supports both; Flux 0.14 will remove the old.
2525

26-
For full details on how to use the implicit style, see [Flux 0.13.6 manual](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).
26+
For full details on the interface for implicit-style optimisers, see the [Flux 0.13.6 manual](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).
2727

2828
```@docs
2929
Flux.params
@@ -51,9 +51,9 @@ julia> @epochs 2 Flux.train!(...)
5151
Flux.@epochs
5252
```
5353
54-
## Callbacks
54+
### Callbacks
5555
56-
`train!` takes an additional argument, `cb`, that's used for callbacks so that you can observe the training process. For example:
56+
Implicit `train!` takes an additional argument, `cb`, that's used for callbacks so that you can observe the training process. For example:
5757
5858
```julia
5959
train!(objective, ps, data, opt, cb = () -> println("training"))
@@ -78,3 +78,5 @@ cb = function ()
7878
end
7979
```
8080
81+
See the page about [callback helpers](@ref man-callback-helpers) for more.
82+

docs/src/training/training.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ The simplest kind of optimisation using the gradient is termed *gradient descent
9797
(or sometimes *stochastic gradient descent* when it is applied to individual examples
9898
in a loop, not to the entire dataset at once).
9999

100-
This needs a *learning rate* which is a small number describing how fast to walk downhill,
101-
usually written as the Greek letter "eta", `η`.
100+
Gradient descent needs a *learning rate* which is a small number describing how fast to walk downhill,
101+
usually written as the Greek letter "eta", `η`. This is what it does:
102102

103103
```julia
104104
η = 0.01 # learning rate
@@ -110,16 +110,14 @@ fmap(model, grads[1]) do p, g
110110
end
111111
```
112112

113-
This is wrapped up as a function [`update!`](@ref Flux.Optimise.update!), which can be used as follows:
114-
115-
```julia
116-
Flux.update!(Descent(0.01), model, grads[1])
117-
```
113+
This update of all parameters is wrapepd up as a function [`update!`](@ref Flux.Optimise.update!)`(opt, model, grads[1])`.
118114

119115
There are many other optimisation rules, which adjust the step size and direction.
120-
Most require some memory of the gradients from earlier steps. The function [`setup`](@ref Flux.Train.setup)
121-
creates the necessary storage for this, for a particular model. This should be done
122-
once, before training, and looks like this:
116+
Most require some memory of the gradients from earlier steps, rather than always
117+
walking straight downhill. The function [`setup`](@ref Flux.Train.setup) creates the
118+
necessary storage for this, for a particular model.
119+
It should be called once, before training, and returns a tree-like object which is the
120+
first argument of `update!`. Like this:
123121

124122
```julia
125123
# Initialise momentum
@@ -128,7 +126,7 @@ opt = Flux.setup(Adam(0.001), model)
128126
for data in train_set
129127
...
130128

131-
#
129+
# Update both model parameters and optimiser state:
132130
Flux.update!(opt, model, grads[1])
133131
end
134132
```
@@ -138,7 +136,7 @@ These are listed on the [optimisers](@ref man-optimisers) page.
138136

139137

140138
!!! note "Implicit-style optimiser state"
141-
This `setep` makes another tree-like structure. Old versions of Flux did not do this,
139+
This `setup` makes another tree-like structure. Old versions of Flux did not do this,
142140
and instead stored a dictionary-like structure within the optimiser `Adam(0.001)`.
143141
This was initialised on first use of the version of `update!` for "implicit" parameters.
144142

@@ -266,12 +264,14 @@ for epoch in 1:100
266264
end
267265
```
268266

269-
270267
## Implicit vs Explicit
271268

272269
Flux used to handle gradients, training, and optimisation rules quite differently.
273270
The new style described above is called "explicit" by Zygote, and the old style "implicit".
274271
Flux 0.13 is the transitional version which supports both.
275272

276-
For full details on the implicit style, see [Flux 0.13.6 documentation](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).
273+
The blue boxes above describe the changes.
274+
For more details on training in the implicit style, see [Flux 0.13.6 documentation](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).
275+
276+
For details about the two gradient modes, see [Zygote's documentation](https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1).
277277

0 commit comments

Comments
 (0)