FluxML · mcabbott · Sep 19, 2022 · Aug 29, 2022 · Aug 29, 2022 · Aug 29, 2022
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -1,5 +1,6 @@
 [deps]
 BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
+ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
 MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"

diff --git a/docs/make.jl b/docs/make.jl
@@ -14,28 +14,30 @@ makedocs(
             "Overview" => "models/overview.md",
             "Basics" => "models/basics.md",
             "Recurrence" => "models/recurrence.md",
-            "Model Reference" => "models/layers.md",
+            "Layer Reference" => "models/layers.md",
             "Loss Functions" => "models/losses.md",
             "Regularisation" => "models/regularisation.md",
             "Advanced Model Building" => "models/advanced.md",
-            "Neural Network primitives from NNlib.jl" => "models/nnlib.md",
-            "Recursive transformations from Functors.jl" => "models/functors.md"
+            "NNlib.jl" => "models/nnlib.md",
+            "Functors.jl" => "models/functors.md",
          ],
          "Handling Data" => [
-             "One-Hot Encoding with OneHotArrays.jl" => "data/onehot.md",
-             "Working with data using MLUtils.jl" => "data/mlutils.md"
+             "MLUtils.jl" => "data/mlutils.md",
+             "OneHotArrays.jl" => "data/onehot.md",
          ],
          "Training Models" => [
              "Optimisers" => "training/optimisers.md",
-             "Training" => "training/training.md"
+             "Training" => "training/training.md",
+             "Zygote.jl" => "training/zygote.md",
          ],
          "GPU Support" => "gpu.md",
-         "Saving & Loading" => "saving.md",
-         "The Julia Ecosystem" => "ecosystem.md",
-         "Utility Functions" => "utilities.md",
+         "Model Tools" => [
+             "Saving & Loading" => "saving.md",
+             "Size Propagation" => "outputsize.md",
+             "Weight Initialisation" => "utilities.md",
+         ],
          "Performance Tips" => "performance.md",
-         "Datasets" => "datasets.md",
-         "Community" => "community.md"
+         "Flux's Ecosystem" => "ecosystem.md",
     ],
     format = Documenter.HTML(
         analytics = "UA-36890222-9",

diff --git a/docs/src/community.md b/docs/src/community.md
diff --git a/docs/src/datasets.md b/docs/src/datasets.md
diff --git a/docs/src/ecosystem.md b/docs/src/ecosystem.md
@@ -1,4 +1,4 @@
-# The Julia Ecosystem
+# The Julia Ecosystem around Flux
 
 One of the main strengths of Julia lies in an ecosystem of packages 
 globally providing a rich and consistent user experience.
@@ -49,7 +49,10 @@ Utility tools you're unlikely to have met if you never used Flux!
 
 ### Datasets
 
+Commonly used machine learning datasets are provided by the following packages in the julia ecosystem:
+
 - [MLDatasets.jl](https://github.com/JuliaML/MLDatasets.jl) focuses on downloading, unpacking, and accessing benchmark datasets.
+- [GraphMLDatasets.jl](https://github.com/yuehhua/GraphMLDatasets.jl): a library for machine learning datasets on graph.
 
 ### Plumbing
 
@@ -87,6 +90,7 @@ Packages based on differentiable programming but not necessarily related to Mach
 
 - [OnlineStats.jl](https://github.com/joshday/OnlineStats.jl) provides single-pass algorithms for statistics.
 
+
 ## Useful miscellaneous packages
 
 Some useful and random packages!

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -18,3 +18,9 @@ NOTE: Flux used to have a CuArrays.jl dependency until v0.10.4, replaced by CUDA
 ## Learning Flux
 
 There are several different ways to learn Flux. If you just want to get started writing models, the [model zoo](https://github.com/FluxML/model-zoo/) gives good starting points for many common ones. This documentation provides a reference to all of Flux's APIs, as well as a from-scratch introduction to Flux's take on models and how they work. Once you understand these docs, congratulations, you also understand [Flux's source code](https://github.com/FluxML/Flux.jl), which is intended to be concise, legible and a good reference for more advanced concepts.
+
+## Community
+
+All Flux users are welcome to join our community on the [Julia forum](https://discourse.julialang.org/), or the [slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning). If you have questions or issues we'll try to help you out.
+
+If you're interested in hacking on Flux, the [source code](https://github.com/FluxML/Flux.jl) is open and easy to understand -- it's all just the same Julia code you work with normally. You might be interested in our [intro issues](https://github.com/FluxML/Flux.jl/labels/good%20first%20issue) to get started or our [contributing guide](https://github.com/FluxML/Flux.jl/blob/master/CONTRIBUTING.md).
diff --git a/docs/src/models/layers.md b/docs/src/models/layers.md
@@ -86,3 +86,12 @@ Many normalisation layers behave differently under training and inference (testi
 Flux.testmode!
 trainmode!
 ```
+
+
+## Listing All Layers
+
+The `modules` command uses Functors to extract a flat list of all layers:
+
+```@docs
+Flux.modules
+```
diff --git a/docs/src/outputsize.md b/docs/src/outputsize.md
@@ -0,0 +1,47 @@
+## Model Building
+
+Flux provides some utility functions to help you generate models in an automated fashion.
+
+[`Flux.outputsize`](@ref) enables you to calculate the output sizes of layers like [`Conv`](@ref)
+when applied to input samples of a given size. This is achieved by passing a "dummy" array into
+the model that preserves size information without running any computation.
+`outputsize(f, inputsize)` works for all layers (including custom layers) out of the box.
+By default, `inputsize` expects the batch dimension,
+but you can exclude the batch size with `outputsize(f, inputsize; padbatch=true)` (assuming it to be one).
+
+Using this utility function lets you automate model building for various inputs like so:
+```julia
+"""
+    make_model(width, height, inchannels, nclasses;
+               layer_config = [16, 16, 32, 32, 64, 64])
+
+Create a CNN for a given set of configuration parameters.
+
+# Arguments
+- `width`: the input image width
+- `height`: the input image height
+- `inchannels`: the number of channels in the input image
+- `nclasses`: the number of output classes
+- `layer_config`: a vector of the number of filters per each conv layer
+"""
+function make_model(width, height, inchannels, nclasses;
+                    layer_config = [16, 16, 32, 32, 64, 64])
+  # construct a vector of conv layers programmatically
+  conv_layers = [Conv((3, 3), inchannels => layer_config[1])]
+  for (infilters, outfilters) in zip(layer_config, layer_config[2:end])
+    push!(conv_layers, Conv((3, 3), infilters => outfilters))
+  end
+
+  # compute the output dimensions for the conv layers
+  # use padbatch=true to set the batch dimension to 1
+  conv_outsize = Flux.outputsize(conv_layers, (width, height, nchannels); padbatch=true)
+
+  # the input dimension to Dense is programatically calculated from
+  #  width, height, and nchannels
+  return Chain(conv_layers..., Dense(prod(conv_outsize) => nclasses))
+end
+```
+
+```@docs
+Flux.outputsize
+```
diff --git a/docs/src/training/callbacks.md b/docs/src/training/callbacks.md
@@ -0,0 +1,77 @@
+## Callback Helpers
+
+```@docs
+Flux.throttle
+Flux.stop
+Flux.skip
+```
+
+## Patience Helpers
+
+Flux provides utilities for controlling your training procedure according to some monitored condition and a maximum `patience`. For example, you can use `early_stopping` to stop training when the model is converging or deteriorating, or you can use `plateau` to check if the model is stagnating.
+
+For example, below we create a pseudo-loss function that decreases, bottoms out, and then increases. The early stopping trigger will break the loop before the loss increases too much.
+```julia
+# create a pseudo-loss that decreases for 4 calls, then starts increasing
+# we call this like loss()
+loss = let t = 0
+  () -> begin
+    t += 1
+    (t - 4) ^ 2
+  end
+end
+
+# create an early stopping trigger
+# returns true when the loss increases for two consecutive steps
+es = early_stopping(loss, 2; init_score = 9)
+
+# this will stop at the 6th (4 decreasing + 2 increasing calls) epoch
+@epochs 10 begin
+  es() && break
+end
+```
+
+The keyword argument `distance` of `early_stopping` is a function of the form `distance(best_score, score)`. By default `distance` is `-`, which implies that the monitored metric `f` is expected to be decreasing and minimized. If you use some increasing metric (e.g. accuracy), you can customize the `distance` function: `(best_score, score) -> score - best_score`.
+```julia
+# create a pseudo-accuracy that increases by 0.01 each time from 0 to 1
+# we call this like acc()
+acc = let v = 0
+  () -> v = max(1, v + 0.01)
+end
+
+# create an early stopping trigger for accuracy
+es = early_stopping(acc, 3; delta = (best_score, score) -> score - best_score)
+
+# this will iterate until the 10th epoch
+@epochs 10 begin
+  es() && break
+end
+```
+
+`early_stopping` and `plateau` are both built on top of `patience`. You can use `patience` to build your own triggers that use a patient counter. For example, if you want to trigger when the loss is below a threshold for several consecutive iterations:
+```julia
+threshold(f, thresh, delay) = patience(delay) do
+  f() < thresh
+end
+```
+
+Both `predicate` in `patience` and `f` in `early_stopping` / `plateau` can accept extra arguments. You can pass such extra arguments to `predicate` or `f` through the returned function:
+```julia
+trigger = patience((a; b) -> a > b, 3)
+
+# this will iterate until the 10th epoch
+@epochs 10 begin
+  trigger(1; b = 2) && break
+end
+
+# this will stop at the 3rd epoch
+@epochs 10 begin
+  trigger(3; b = 2) && break
+end
+```
+
+```@docs
+Flux.patience
+Flux.early_stopping
+Flux.plateau
+```
diff --git a/docs/src/training/zygote.md b/docs/src/training/zygote.md
@@ -0,0 +1,22 @@
+# Automatic Differentiation using Zygote.jl
+
+Flux re-exports the `gradient` from [Zygote](https://github.com/FluxML/Zygote.jl), and uses this function within [`train!`](@ref) to differentiate the model. Zygote has its own [documentation](https://fluxml.ai/Zygote.jl/dev/), in particulat listing some [limitations](https://fluxml.ai/Zygote.jl/dev/limitations/).
+
+```@docs
+Zygote.gradient
+Zygote.jacobian
+Zygote.withgradient
+```
+
+Sometimes it is necessary to exclude some code, or a whole function, from automatic differentiation. This can be done using [ChainRules](https://github.com/JuliaDiff/ChainRules.jl):
+
+```@docs
+ChainRulesCore.ignore_derivatives
+ChainRulesCore.@non_differentiable
+```
+
+To manually supply the gradient for one function, you should define a method of `rrule`. ChainRules has [detailed documentation](https://juliadiff.org/ChainRulesCore.jl/stable/) on how this works.
+
+```@docs
+ChainRulesCore.rrule
+```