FluxML
diff --git a/‎docs/make.jl
Lines changed: 10 additions & 4 deletions b/‎docs/make.jl
Lines changed: 10 additions & 4 deletions
diff --git a/‎docs/src/data/dataloader.md
Lines changed: 1 addition & 1 deletion b/‎docs/src/data/dataloader.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/src/data/onehot.md
Lines changed: 9 additions & 0 deletions b/‎docs/src/data/onehot.md
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/src/datasets.md
Lines changed: 20 additions & 0 deletions b/‎docs/src/datasets.md
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/src/models/basics.md
Lines changed: 2 additions & 2 deletions b/‎docs/src/models/basics.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/src/models/layers.md
Lines changed: 6 additions & 3 deletions b/‎docs/src/models/layers.md
Lines changed: 6 additions & 3 deletions
diff --git a/‎docs/src/models/regularisation.md
Lines changed: 4 additions & 0 deletions b/‎docs/src/models/regularisation.md
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/src/performance.md
Lines changed: 1 addition & 1 deletion b/‎docs/src/performance.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/src/training/optimisers.md
Lines changed: 1 addition & 0 deletions b/‎docs/src/training/optimisers.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/src/training/training.md
Lines changed: 5 additions & 0 deletions b/‎docs/src/training/training.md
Lines changed: 5 additions & 0 deletions
@@ -1,6 +1,8 @@
 using Documenter, Flux, NNlib
 
+DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive=true)
 makedocs(modules=[Flux, NNlib],
+         doctest = VERSION >= v"1.4",
          sitename = "Flux",
          pages = ["Home" => "index.md",
                   "Building Models" =>
@@ -19,12 +21,16 @@ makedocs(modules=[Flux, NNlib],
                   "GPU Support" => "gpu.md",
                   "Saving & Loading" => "saving.md",
                   "The Julia Ecosystem" => "ecosystem.md",
+                  "Utility Functions" => "utilities.md",
                   "Performance Tips" => "performance.md",
+                  "Datasets" => "datasets.md",
                   "Community" => "community.md"],
-         format = Documenter.HTML(assets = ["assets/flux.css"],
-                                  analytics = "UA-36890222-9",
-                                  prettyurls = haskey(ENV, "CI")))
+         format = Documenter.HTML(
+             analytics = "UA-36890222-9",
+             assets = ["assets/flux.css"],
+             prettyurls = get(ENV, "CI", nothing) == "true"),
+         )
 
-deploydocs(repo = "github.com/FluxML/Flux.jl.git",    
+deploydocs(repo = "github.com/FluxML/Flux.jl.git",
            target = "build",
            push_preview = true)
@@ -3,4 +3,4 @@ Flux provides the `DataLoader` type in the `Flux.Data` module to handle iteratio
 
 ```@docs
 Flux.Data.DataLoader
-```
+```
@@ -31,6 +31,11 @@ julia> onecold([0.3, 0.2, 0.5], [:a, :b, :c])
 :c
 ```
 
+```@docs
+Flux.onehot
+Flux.onecold
+```
+
 ## Batches
 
 `onehotbatch` creates a batch (matrix) of one-hot vectors, and `onecold` treats matrices as batches.
@@ -52,3 +57,7 @@ julia> onecold(ans, [:a, :b, :c])
 ```
 
 Note that these operations returned `OneHotVector` and `OneHotMatrix` rather than `Array`s. `OneHotVector`s behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.
+
+```@docs
+Flux.onehotbatch
+```
@@ -0,0 +1,20 @@
+# Datasets
+
+Flux includes several standard machine learning datasets.
+
+```@docs
+Flux.Data.Iris.features()
+Flux.Data.Iris.labels()
+Flux.Data.MNIST.images()
+Flux.Data.MNIST.labels()
+Flux.Data.FashionMNIST.images()
+Flux.Data.FashionMNIST.labels()
+Flux.Data.CMUDict.phones()
+Flux.Data.CMUDict.symbols()
+Flux.Data.CMUDict.rawdict()
+Flux.Data.CMUDict.cmudict()
+Flux.Data.Sentiment.train()
+Flux.Data.Sentiment.test()
+Flux.Data.Sentiment.dev()
+```
+
@@ -220,7 +220,7 @@ Flux.@functor Affine
 
 This enables a useful extra set of functionality for our `Affine` layer, such as [collecting its parameters](../training/optimisers.md) or [moving it to the GPU](../gpu.md).
 
-For some more helpful tricks, including parameter freezing, please checkout the [advanced usage guide](advacned.md).
+For some more helpful tricks, including parameter freezing, please checkout the [advanced usage guide](advanced.md).
 
 ## Utility functions
 
@@ -240,5 +240,5 @@ Currently limited to the following layers:
 - `MeanPool`
 
 ```@docs
-outdims
+Flux.outdims
 ```
@@ -32,6 +32,7 @@ RNN
 LSTM
 GRU
 Flux.Recur
+Flux.reset!
 ```
 
 ## Other General Purpose Layers
@@ -49,20 +50,22 @@ SkipConnection
 These layers don't affect the structure of the network but may improve training times or reduce overfitting.
 
 ```@docs
+Flux.normalise
 BatchNorm
-Dropout
 Flux.dropout
+Dropout
 AlphaDropout
 LayerNorm
+InstanceNorm
 GroupNorm
 ```
 
 ### Testmode
 
-Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides `testmode!`. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.
+Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides `Flux.testmode!`. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.
 
 ```@docs
-testmode!
+Flux.testmode!
 trainmode!
 ```
 
 
@@ -64,3 +64,7 @@ julia> activations(c, rand(10))
 julia> sum(norm, ans)
 2.1166067f0
 ```
+
+```@docs
+Flux.activations
+```
@@ -52,7 +52,7 @@ e.g.
 ```julia
 function loss_total(xs::AbstractVector{<:Vector}, ys::AbstractVector{<:Vector})
     sum(zip(xs, ys)) do (x, y_target)
-        y_pred = model(x) #  evaluate the model
+        y_pred = model(x)  # evaluate the model
         return loss(y_pred, y_target)
     end
 end
 
@@ -52,6 +52,7 @@ Momentum
 Nesterov
 RMSProp
 ADAM
+RADAM
 AdaMax
 ADAGrad
 ADADelta
 
@@ -32,6 +32,7 @@ Flux.train!(loss, ps, data, opt)
 ```
 
 The objective will almost always be defined in terms of some *cost function* that measures the distance of the prediction `m(x)` from the target `y`. Flux has several of these built in, like `mse` for mean squared error or `crossentropy` for cross entropy loss, but you can calculate it however you want.
+For a list of all built-in loss functions, check out the [layer reference](../models/layers.md).
 
 At first glance it may seem strange that the model that we want to train is not part of the input arguments of `Flux.train!` too. However the target of the optimizer is not the model itself, but the objective function that represents the departure between modelled and observed data. In other words, the model is implicitly defined in the objective function, and there is no need to give it explicitly. Passing the objective function instead of the model and a cost function separately provides more flexibility, and the possibility of optimizing the calculations.
 
@@ -94,6 +95,10 @@ julia> @epochs 2 Flux.train!(...)
 # Train for two epochs
 ```
 
+```@docs
+Flux.@epochs
+```
+
 ## Callbacks
 
 `train!` takes an additional argument, `cb`, that's used for callbacks so that you can observe the training process. For example: