Add ParameterSchedulers.jl to docs

darsnack · darsnack · commit be41a43280ef · 2021-02-16T18:22:38.000-06:00
diff --git a/docs/src/ecosystem.md b/docs/src/ecosystem.md
@@ -16,5 +16,6 @@ machine learning and deep learning workflows:
 - [Parameters.jl](https://github.com/mauro3/Parameters.jl): types with default field values, keyword constructors and (un-)pack macros
 - [ProgressMeters.jl](https://github.com/timholy/ProgressMeter.jl): progress meters for long-running computations
 - [TensorBoardLogger.jl](https://github.com/PhilipVinc/TensorBoardLogger.jl): easy peasy logging to [tensorboard](https://www.tensorflow.org/tensorboard) in Julia
+- [ParameterSchedulers.jl](https://github.com/darsnack/ParameterSchedulers.jl): standard scheduling policies for machine learning
 
 This tight integration among Julia packages is shown in some of the examples in the [model-zoo](https://github.com/FluxML/model-zoo) repository.
diff --git a/docs/src/training/optimisers.md b/docs/src/training/optimisers.md
@@ -137,6 +137,33 @@ In this manner it is possible to compose optimisers for some added flexibility.
 Flux.Optimise.Optimiser
 ```
 
+## Scheduling Optimisers
+
+In practice, it is fairly common to schedule the learning rate of an optimiser to obtain faster convergence. There are a variety of popular scheduling policies, and you can find implementations of them in [ParameterSchedulers.jl](https://darsnack.github.io/ParameterSchedulers.jl/dev/README.html). The documentation for ParameterSchedulers.jl provides a more detailed overview of the different scheduling policies, and how to use them with Flux optimizers. Below, we provide a brief snippet illustrating a [cosine annealing](https://arxiv.org/pdf/1608.03983.pdf) schedule with a momentum optimiser.
+
+First, we import ParameterSchedulers.jl and initalize a cosine annealing schedule to varying the learning rate between `1e-4` and `1e-2` every 10 steps. We also create a new [`Momentum`](@ref) optimiser.
+```julia
+using ParameterSchedulers
+
+schedule = Cos(λ0 = 1e-4, λ1 = 1e-2, period = 10)
+opt = Momentum()
+```
+
+Next, you can use your schedule directly in a `for`-loop like any iterator:
+```julia
+for (eta, epoch) in zip(schedule, 1:100)
+  opt.eta = eta
+  # your training code here
+end
+```
+
+Alternatively, use `ScheduledOptim` from ParameterSchedulers.jl to wrap the optimiser and schedule into a single object that behaves like any Flux optimiser.
+```julia
+@epochs 100 Flux.train!(loss, ps, data, ScheduledOptim(schedule, opt))
+```
+
+ParameterSchedulers.jl allows for many more scheduling policies including arbitrary functions, looping any function with a given period, or sequences of many schedules. See the ParameterSchedulers.jl documentation for more info.
+
 ## Decays
 
 Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.