Merge pull request #2035 from Saransh-cpp/more-404s

ToucheSir · web-flow · commit f9b95c4b3e96 · 2022-08-13T08:54:47.000-07:00
Fix the last remaining 404 errors
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -4,7 +4,8 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
 MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
 NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
+OneHotArrays = "0b1bfda6-eb8a-41d2-88d8-f5af5cad476f"
 Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
 
 [compat]
-Documenter = "0.26"
+Documenter = "0.27"
diff --git a/docs/make.jl b/docs/make.jl
@@ -1,12 +1,13 @@
-using Documenter, Flux, NNlib, Functors, MLUtils, BSON, Optimisers
+using Documenter, Flux, NNlib, Functors, MLUtils, BSON, Optimisers, OneHotArrays
 
 
 DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive = true)
 
 makedocs(
-    modules = [Flux, NNlib, Functors, MLUtils, BSON, Optimisers],
+    modules = [Flux, NNlib, Functors, MLUtils, BSON, Optimisers, OneHotArrays],
     doctest = false,
     sitename = "Flux",
+    strict = [:cross_references,],
     pages = [
         "Home" => "index.md",
         "Building Models" => [
diff --git a/docs/src/data/onehot.md b/docs/src/data/onehot.md
@@ -1,9 +1,9 @@
-# One-Hot Encoding
+# One-Hot Encoding with OneHotArrays.jl
 
-It's common to encode categorical variables (like `true`, `false` or `cat`, `dog`) in "one-of-k" or ["one-hot"](https://en.wikipedia.org/wiki/One-hot) form. Flux provides the `onehot` function to make this easy.
+It's common to encode categorical variables (like `true`, `false` or `cat`, `dog`) in "one-of-k" or ["one-hot"](https://en.wikipedia.org/wiki/One-hot) form. [OneHotArrays.jl](https://github.com/FluxML/OneHotArrays.jl) provides the `onehot` function to make this easy.
 
 ```jldoctest onehot
-julia> using Flux: onehot, onecold
+julia> using OneHotArrays
 
 julia> onehot(:b, [:a, :b, :c])
 3-element OneHotVector(::UInt32) with eltype Bool:
@@ -34,7 +34,7 @@ julia> onecold([0.3, 0.2, 0.5], [:a, :b, :c])
 For multiple samples at once, `onehotbatch` creates a batch (matrix) of one-hot vectors, and `onecold` treats matrices as batches.
 
 ```jldoctest onehot
-julia> using Flux: onehotbatch
+julia> using OneHotArrays
 
 julia> onehotbatch([:b, :a, :b], [:a, :b, :c])
 3×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
@@ -52,7 +52,9 @@ julia> onecold(ans, [:a, :b, :c])
 Note that these operations returned `OneHotVector` and `OneHotMatrix` rather than `Array`s. `OneHotVector`s behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.
 
 ```@docs
-Flux.onehot
-Flux.onecold
-Flux.onehotbatch
+OneHotArrays.onehot
+OneHotArrays.onecold
+OneHotArrays.onehotbatch
+OneHotArrays.OneHotVector
+OneHotArrays.OneHotMatrix
 ```
diff --git a/docs/src/models/layers.md b/docs/src/models/layers.md
@@ -71,6 +71,7 @@ These layers don't affect the structure of the network but may improve training
 Flux.normalise
 BatchNorm
 Dropout
+Flux.dropout
 AlphaDropout
 LayerNorm
 InstanceNorm
diff --git a/docs/src/models/overview.md b/docs/src/models/overview.md
@@ -42,7 +42,7 @@ Normally, your training and test data come from real world observations, but thi
 
 Now, build a model to make predictions with `1` input and `1` output:
 
-```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> model = Dense(1 => 1)
 Dense(1 => 1)       # 2 parameters
 
@@ -66,15 +66,15 @@ Dense(1 => 1)       # 2 parameters
 
 This model will already make predictions, though not accurate ones yet:
 
-```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> predict(x_train)
 1×6 Matrix{Float32}:
  0.0  0.906654  1.81331  2.71996  3.62662  4.53327
 ```
 
 In order to make better predictions, you'll need to provide a *loss function* to tell Flux how to objectively *evaluate* the quality of a prediction. Loss functions compute the cumulative distance between actual values and predictions. 
 
-```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> loss(x, y) = Flux.Losses.mse(predict(x), y);
 
 julia> loss(x_train, y_train)
@@ -100,7 +100,7 @@ julia> data = [(x_train, y_train)]
 
 Now, we have the optimiser and data we'll pass to `train!`. All that remains are the parameters of the model. Remember, each model is a Julia struct with a function and configurable parameters. Remember, the dense layer has weights and biases that depend on the dimensions of the inputs and outputs: 
 
-```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> predict.weight
 1×1 Matrix{Float32}:
  0.9066542
@@ -112,7 +112,7 @@ julia> predict.bias
 
 The dimensions of these model parameters depend on the number of inputs and outputs. Since models can have hundreds of inputs and several layers, it helps to have a function to collect the parameters into the data structure Flux expects:
 
-```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> parameters = Flux.params(predict)
 Params([Float32[0.9066542], Float32[0.0]])
 ```
@@ -135,14 +135,14 @@ julia> train!(loss, parameters, data, opt)
 
 And check the loss:
 
-```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> loss(x_train, y_train)
 116.38745f0
 ```
 
 It went down. Why? 
 
-```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> parameters
 Params([Float32[7.5777884], Float32[1.9466728]])
 ```
@@ -153,7 +153,7 @@ The parameters have changed. This single step is the essence of machine learning
 
 In the previous section, we made a single call to `train!` which iterates over the data we passed in just once. An *epoch* refers to one pass over the dataset. Typically, we will run the training for multiple epochs to drive the loss down even further. Let's run it a few more times:
 
-```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> for epoch in 1:200
          train!(loss, parameters, data, opt)
        end
@@ -171,7 +171,7 @@ After 200 training steps, the loss went down, and the parameters are getting clo
 
 Now, let's verify the predictions:
 
-```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> predict(x_test)
 1×5 Matrix{Float32}:
  26.1121  30.13  34.1479  38.1657  42.1836
diff --git a/docs/src/models/recurrence.md b/docs/src/models/recurrence.md
@@ -94,7 +94,7 @@ In this example, each output has only one component.
 
 Using the previously defined `m` recurrent model, we can now apply it to a single step from our sequence:
 
-```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> x = rand(Float32, 2);
 
 julia> m(x)
@@ -111,7 +111,7 @@ iterating the model on a sequence of data.
 
 To do so, we'll need to structure the input data as a `Vector` of observations at each time step. This `Vector` will therefore be of `length = seq_length` and each of its elements will represent the input features for a given step. In our example, this translates into a `Vector` of length 3, where each element is a `Matrix` of size `(features, batch_size)`, or just a `Vector` of length `features` if dealing with a single observation.  
 
-```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> x = [rand(Float32, 2) for i = 1:3];
 
 julia> [m(xi) for xi in x]
diff --git a/docs/src/models/regularisation.md b/docs/src/models/regularisation.md
@@ -28,7 +28,7 @@ julia> loss(x, y) = logitcrossentropy(m(x), y) + penalty();
 When working with layers, Flux provides the `params` function to grab all
 parameters at once. We can easily penalise everything with `sum`:
 
-```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> Flux.params(m)
 Params([Float32[0.34704182 -0.48532376 … -0.06914271 -0.38398427; 0.5201164 -0.033709668 … -0.36169025 -0.5552353; … ; 0.46534058 0.17114447 … -0.4809643 0.04993277; -0.47049698 -0.6206029 … -0.3092334 -0.47857067], Float32[0.0, 0.0, 0.0, 0.0, 0.0]])
 
@@ -40,7 +40,7 @@ julia> sum(sqnorm, Flux.params(m))
 
 Here's a larger example with a multi-layer perceptron.
 
-```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> m = Chain(Dense(28^2 => 128, relu), Dense(128 => 32, relu), Dense(32 => 10))
 Chain(
   Dense(784 => 128, relu),              # 100_480 parameters
@@ -58,7 +58,7 @@ julia> loss(rand(28^2), rand(10))
 
 One can also easily add per-layer regularisation via the `activations` function:
 
-```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
 julia> using Flux: activations
 
 julia> c = Chain(Dense(10 => 5, σ), Dense(5 => 2), softmax)
diff --git a/docs/src/training/optimisers.md b/docs/src/training/optimisers.md
@@ -202,4 +202,5 @@ and the complete `Optimisers` package under the `Flux.Optimisers` namespace.
 ```@docs
 Optimisers.destructure
 Optimisers.trainable
+Optimisers.isnumeric
 ```
diff --git a/src/layers/basic.jl b/src/layers/basic.jl
@@ -651,7 +651,7 @@ for a vocabulary of size `in`.
 
 This layer is often used to store word embeddings and retrieve them using indices. 
 The input to the layer can be either a vector of indexes
-or the corresponding [`onehot encoding`](@ref Flux.onehotbatch). 
+or the corresponding [`onehot encoding`](@ref OneHotArrays.onehotbatch). 
 
 # Examples
 ```jldoctest
diff --git a/src/losses/functions.jl b/src/losses/functions.jl
@@ -167,7 +167,7 @@ Cross entropy is typically used as a loss in multi-class classification,
 in which case the labels `y` are given in a one-hot format.
 `dims` specifies the dimension (or the dimensions) containing the class probabilities.
 The prediction `ŷ` is supposed to sum to one across `dims`,
-as would be the case with the output of a [`softmax`](@ref) operation.
+as would be the case with the output of a [softmax](@ref Softmax) operation.
 
 For numerical stability, it is recommended to use [`logitcrossentropy`](@ref)
 rather than `softmax` followed by `crossentropy` .
@@ -225,7 +225,7 @@ Return the cross entropy calculated by
 
 This is mathematically equivalent to `crossentropy(softmax(ŷ), y)`,
 but is more numerically stable than using functions [`crossentropy`](@ref)
-and [`softmax`](@ref) separately.
+and [softmax](@ref Softmax) separately.
 
 See also: [`binarycrossentropy`](@ref), [`logitbinarycrossentropy`](@ref), [`label_smoothing`](@ref).
 
@@ -262,7 +262,7 @@ Return the binary cross-entropy loss, computed as
 
     agg(@.(-y * log(ŷ + ϵ) - (1 - y) * log(1 - ŷ + ϵ)))
 
-Where typically, the prediction `ŷ` is given by the output of a [`sigmoid`](@ref) activation.
+Where typically, the prediction `ŷ` is given by the output of a [sigmoid](@ref Activation-Functions) activation.
 The `ϵ` term is included to avoid infinity. Using [`logitbinarycrossentropy`](@ref) is recomended
 over `binarycrossentropy` for numerical stability.
 
@@ -452,7 +452,7 @@ end
     binary_focal_loss(ŷ, y; agg=mean, γ=2, ϵ=eps(ŷ))
 
 Return the [binary_focal_loss](https://arxiv.org/pdf/1708.02002.pdf)
-The input, 'ŷ', is expected to be normalized (i.e. [`softmax`](@ref) output).
+The input, 'ŷ', is expected to be normalized (i.e. [softmax](@ref Softmax) output).
 
 For `γ == 0`, the loss is mathematically equivalent to [`Losses.binarycrossentropy`](@ref).
 
@@ -493,7 +493,7 @@ end
 Return the [focal_loss](https://arxiv.org/pdf/1708.02002.pdf)
 which can be used in classification tasks with highly imbalanced classes.
 It down-weights well-classified examples and focuses on hard examples.
-The input, 'ŷ', is expected to be normalized (i.e. [`softmax`](@ref) output).
+The input, 'ŷ', is expected to be normalized (i.e. [softmax](@ref Softmax) output).
 
 The modulating factor, `γ`, controls the down-weighting strength.
 For `γ == 0`, the loss is mathematically equivalent to [`Losses.crossentropy`](@ref).