|
2 | 2 |
|
3 | 3 | ## An optimisation rule
|
4 | 4 |
|
5 |
| -A new optimiser must overload two functions, [`apply!`](@ref) and [`init`](@ref). |
| 5 | +A new optimiser must overload two functions, [`apply!`](@ref Optimisers.apply!) and [`init`](@ref Optimisers.init). |
6 | 6 | These act on one array of parameters:
|
7 | 7 |
|
8 | 8 | ```julia
|
@@ -33,8 +33,8 @@ It of course also makes it easier to store the state.
|
33 | 33 |
|
34 | 34 | ## Usage with [Flux.jl](https://github.com/FluxML/Flux.jl)
|
35 | 35 |
|
36 |
| -To apply such an optimiser to a whole model, [`setup`](@ref) builds a tree containing any initial |
37 |
| -state for every trainable array. Then at each step, [`update`](@ref) uses this and the gradient |
| 36 | +To apply such an optimiser to a whole model, [`setup`](@ref Optimisers.setup) builds a tree containing any initial |
| 37 | +state for every trainable array. Then at each step, [`update`](@ref Optimisers.update) uses this and the gradient |
38 | 38 | to adjust the model:
|
39 | 39 |
|
40 | 40 | ```julia
|
@@ -142,10 +142,10 @@ end;
|
142 | 142 |
|
143 | 143 | Optimisers.jl uses [Functors.jl](https://fluxml.ai/Functors.jl) to walk the `struct`s
|
144 | 144 | making up the model, for which they must be annotated `@functor Type`.
|
145 |
| -By default optimisation will alter all [`isnumeric`](@ref) arrays. |
| 145 | +By default optimisation will alter all [`isnumeric`](@ref Optimisers.isnumeric) arrays. |
146 | 146 |
|
147 | 147 | If some arrays of a particular layer should not be treated this way,
|
148 |
| -you can define a method for [`trainable`](@ref) |
| 148 | +you can define a method for [`trainable`](@ref Optimisers.trainable) |
149 | 149 |
|
150 | 150 | ```julia
|
151 | 151 | struct Layer{T}
|
@@ -239,7 +239,7 @@ from StaticArrays.jl.
|
239 | 239 | ## Obtaining a flat parameter vector
|
240 | 240 |
|
241 | 241 | Instead of a nested tree-like structure, sometimes is is convenient to have all the
|
242 |
| -parameters as one simple vector. Optimisers.jl contains a function [`destructure`](@ref) |
| 242 | +parameters as one simple vector. Optimisers.jl contains a function [`destructure`](@ref Optimisers.destructure) |
243 | 243 | which creates this vector, and also creates way to re-build the original structure
|
244 | 244 | with new parameters. Both flattening and re-building may be used within `gradient` calls.
|
245 | 245 |
|
@@ -270,7 +270,7 @@ st, flat = Optimisers.update(st, flat, ∇flat)
|
270 | 270 |
|
271 | 271 | Here `flat` contains only the 283 trainable parameters, while the non-trainable
|
272 | 272 | ones are preserved inside `re`, an object of type `Restructure`.
|
273 |
| -When defining new layers, these can be specified if necessary by overloading [`trainable`](@ref). |
| 273 | +When defining new layers, these can be specified if necessary by overloading [`trainable`](@ref Optimisers.trainable). |
274 | 274 | By default, all numeric arrays visible to [Functors.jl](https://github.com/FluxML/Functors.jl)
|
275 | 275 | are assumed to contain trainable parameters.
|
276 | 276 | Tied parameters (arrays appearing in different layers) are included only once in `flat`.
|
|
0 commit comments