Skip to content

Commit f9b95c4

Browse files
authored
Merge pull request #2035 from Saransh-cpp/more-404s
Fix the last remaining 404 errors
2 parents 1914f38 + 8b5c92f commit f9b95c4

File tree

10 files changed

+36
-30
lines changed

10 files changed

+36
-30
lines changed

docs/Project.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
44
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
55
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
66
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
7+
OneHotArrays = "0b1bfda6-eb8a-41d2-88d8-f5af5cad476f"
78
Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
89

910
[compat]
10-
Documenter = "0.26"
11+
Documenter = "0.27"

docs/make.jl

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
1-
using Documenter, Flux, NNlib, Functors, MLUtils, BSON, Optimisers
1+
using Documenter, Flux, NNlib, Functors, MLUtils, BSON, Optimisers, OneHotArrays
22

33

44
DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive = true)
55

66
makedocs(
7-
modules = [Flux, NNlib, Functors, MLUtils, BSON, Optimisers],
7+
modules = [Flux, NNlib, Functors, MLUtils, BSON, Optimisers, OneHotArrays],
88
doctest = false,
99
sitename = "Flux",
10+
strict = [:cross_references,],
1011
pages = [
1112
"Home" => "index.md",
1213
"Building Models" => [

docs/src/data/onehot.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
# One-Hot Encoding
1+
# One-Hot Encoding with OneHotArrays.jl
22

3-
It's common to encode categorical variables (like `true`, `false` or `cat`, `dog`) in "one-of-k" or ["one-hot"](https://en.wikipedia.org/wiki/One-hot) form. Flux provides the `onehot` function to make this easy.
3+
It's common to encode categorical variables (like `true`, `false` or `cat`, `dog`) in "one-of-k" or ["one-hot"](https://en.wikipedia.org/wiki/One-hot) form. [OneHotArrays.jl](https://github.com/FluxML/OneHotArrays.jl) provides the `onehot` function to make this easy.
44

55
```jldoctest onehot
6-
julia> using Flux: onehot, onecold
6+
julia> using OneHotArrays
77
88
julia> onehot(:b, [:a, :b, :c])
99
3-element OneHotVector(::UInt32) with eltype Bool:
@@ -34,7 +34,7 @@ julia> onecold([0.3, 0.2, 0.5], [:a, :b, :c])
3434
For multiple samples at once, `onehotbatch` creates a batch (matrix) of one-hot vectors, and `onecold` treats matrices as batches.
3535

3636
```jldoctest onehot
37-
julia> using Flux: onehotbatch
37+
julia> using OneHotArrays
3838
3939
julia> onehotbatch([:b, :a, :b], [:a, :b, :c])
4040
3×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
@@ -52,7 +52,9 @@ julia> onecold(ans, [:a, :b, :c])
5252
Note that these operations returned `OneHotVector` and `OneHotMatrix` rather than `Array`s. `OneHotVector`s behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.
5353

5454
```@docs
55-
Flux.onehot
56-
Flux.onecold
57-
Flux.onehotbatch
55+
OneHotArrays.onehot
56+
OneHotArrays.onecold
57+
OneHotArrays.onehotbatch
58+
OneHotArrays.OneHotVector
59+
OneHotArrays.OneHotMatrix
5860
```

docs/src/models/layers.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ These layers don't affect the structure of the network but may improve training
7171
Flux.normalise
7272
BatchNorm
7373
Dropout
74+
Flux.dropout
7475
AlphaDropout
7576
LayerNorm
7677
InstanceNorm

docs/src/models/overview.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Normally, your training and test data come from real world observations, but thi
4242

4343
Now, build a model to make predictions with `1` input and `1` output:
4444

45-
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
45+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
4646
julia> model = Dense(1 => 1)
4747
Dense(1 => 1) # 2 parameters
4848
@@ -66,15 +66,15 @@ Dense(1 => 1) # 2 parameters
6666

6767
This model will already make predictions, though not accurate ones yet:
6868

69-
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
69+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
7070
julia> predict(x_train)
7171
1×6 Matrix{Float32}:
7272
0.0 0.906654 1.81331 2.71996 3.62662 4.53327
7373
```
7474

7575
In order to make better predictions, you'll need to provide a *loss function* to tell Flux how to objectively *evaluate* the quality of a prediction. Loss functions compute the cumulative distance between actual values and predictions.
7676

77-
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
77+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
7878
julia> loss(x, y) = Flux.Losses.mse(predict(x), y);
7979
8080
julia> loss(x_train, y_train)
@@ -100,7 +100,7 @@ julia> data = [(x_train, y_train)]
100100

101101
Now, we have the optimiser and data we'll pass to `train!`. All that remains are the parameters of the model. Remember, each model is a Julia struct with a function and configurable parameters. Remember, the dense layer has weights and biases that depend on the dimensions of the inputs and outputs:
102102

103-
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
103+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
104104
julia> predict.weight
105105
1×1 Matrix{Float32}:
106106
0.9066542
@@ -112,7 +112,7 @@ julia> predict.bias
112112

113113
The dimensions of these model parameters depend on the number of inputs and outputs. Since models can have hundreds of inputs and several layers, it helps to have a function to collect the parameters into the data structure Flux expects:
114114

115-
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
115+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
116116
julia> parameters = Flux.params(predict)
117117
Params([Float32[0.9066542], Float32[0.0]])
118118
```
@@ -135,14 +135,14 @@ julia> train!(loss, parameters, data, opt)
135135

136136
And check the loss:
137137

138-
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
138+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
139139
julia> loss(x_train, y_train)
140140
116.38745f0
141141
```
142142

143143
It went down. Why?
144144

145-
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
145+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
146146
julia> parameters
147147
Params([Float32[7.5777884], Float32[1.9466728]])
148148
```
@@ -153,7 +153,7 @@ The parameters have changed. This single step is the essence of machine learning
153153

154154
In the previous section, we made a single call to `train!` which iterates over the data we passed in just once. An *epoch* refers to one pass over the dataset. Typically, we will run the training for multiple epochs to drive the loss down even further. Let's run it a few more times:
155155

156-
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
156+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
157157
julia> for epoch in 1:200
158158
train!(loss, parameters, data, opt)
159159
end
@@ -171,7 +171,7 @@ After 200 training steps, the loss went down, and the parameters are getting clo
171171

172172
Now, let's verify the predictions:
173173

174-
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+"
174+
```jldoctest overview; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
175175
julia> predict(x_test)
176176
1×5 Matrix{Float32}:
177177
26.1121 30.13 34.1479 38.1657 42.1836

docs/src/models/recurrence.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ In this example, each output has only one component.
9494

9595
Using the previously defined `m` recurrent model, we can now apply it to a single step from our sequence:
9696

97-
```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+"
97+
```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
9898
julia> x = rand(Float32, 2);
9999
100100
julia> m(x)
@@ -111,7 +111,7 @@ iterating the model on a sequence of data.
111111

112112
To do so, we'll need to structure the input data as a `Vector` of observations at each time step. This `Vector` will therefore be of `length = seq_length` and each of its elements will represent the input features for a given step. In our example, this translates into a `Vector` of length 3, where each element is a `Matrix` of size `(features, batch_size)`, or just a `Vector` of length `features` if dealing with a single observation.
113113

114-
```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+"
114+
```jldoctest recurrence; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
115115
julia> x = [rand(Float32, 2) for i = 1:3];
116116
117117
julia> [m(xi) for xi in x]

docs/src/models/regularisation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ julia> loss(x, y) = logitcrossentropy(m(x), y) + penalty();
2828
When working with layers, Flux provides the `params` function to grab all
2929
parameters at once. We can easily penalise everything with `sum`:
3030

31-
```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+"
31+
```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
3232
julia> Flux.params(m)
3333
Params([Float32[0.34704182 -0.48532376 … -0.06914271 -0.38398427; 0.5201164 -0.033709668 … -0.36169025 -0.5552353; … ; 0.46534058 0.17114447 … -0.4809643 0.04993277; -0.47049698 -0.6206029 … -0.3092334 -0.47857067], Float32[0.0, 0.0, 0.0, 0.0, 0.0]])
3434
@@ -40,7 +40,7 @@ julia> sum(sqnorm, Flux.params(m))
4040

4141
Here's a larger example with a multi-layer perceptron.
4242

43-
```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+"
43+
```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
4444
julia> m = Chain(Dense(28^2 => 128, relu), Dense(128 => 32, relu), Dense(32 => 10))
4545
Chain(
4646
Dense(784 => 128, relu), # 100_480 parameters
@@ -58,7 +58,7 @@ julia> loss(rand(28^2), rand(10))
5858

5959
One can also easily add per-layer regularisation via the `activations` function:
6060

61-
```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+"
61+
```jldoctest regularisation; filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
6262
julia> using Flux: activations
6363
6464
julia> c = Chain(Dense(10 => 5, σ), Dense(5 => 2), softmax)

docs/src/training/optimisers.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,4 +202,5 @@ and the complete `Optimisers` package under the `Flux.Optimisers` namespace.
202202
```@docs
203203
Optimisers.destructure
204204
Optimisers.trainable
205+
Optimisers.isnumeric
205206
```

src/layers/basic.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -651,7 +651,7 @@ for a vocabulary of size `in`.
651651
652652
This layer is often used to store word embeddings and retrieve them using indices.
653653
The input to the layer can be either a vector of indexes
654-
or the corresponding [`onehot encoding`](@ref Flux.onehotbatch).
654+
or the corresponding [`onehot encoding`](@ref OneHotArrays.onehotbatch).
655655
656656
# Examples
657657
```jldoctest

src/losses/functions.jl

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ Cross entropy is typically used as a loss in multi-class classification,
167167
in which case the labels `y` are given in a one-hot format.
168168
`dims` specifies the dimension (or the dimensions) containing the class probabilities.
169169
The prediction `ŷ` is supposed to sum to one across `dims`,
170-
as would be the case with the output of a [`softmax`](@ref) operation.
170+
as would be the case with the output of a [softmax](@ref Softmax) operation.
171171
172172
For numerical stability, it is recommended to use [`logitcrossentropy`](@ref)
173173
rather than `softmax` followed by `crossentropy` .
@@ -225,7 +225,7 @@ Return the cross entropy calculated by
225225
226226
This is mathematically equivalent to `crossentropy(softmax(ŷ), y)`,
227227
but is more numerically stable than using functions [`crossentropy`](@ref)
228-
and [`softmax`](@ref) separately.
228+
and [softmax](@ref Softmax) separately.
229229
230230
See also: [`binarycrossentropy`](@ref), [`logitbinarycrossentropy`](@ref), [`label_smoothing`](@ref).
231231
@@ -262,7 +262,7 @@ Return the binary cross-entropy loss, computed as
262262
263263
agg(@.(-y * log(ŷ + ϵ) - (1 - y) * log(1 - ŷ + ϵ)))
264264
265-
Where typically, the prediction `ŷ` is given by the output of a [`sigmoid`](@ref) activation.
265+
Where typically, the prediction `ŷ` is given by the output of a [sigmoid](@ref Activation-Functions) activation.
266266
The `ϵ` term is included to avoid infinity. Using [`logitbinarycrossentropy`](@ref) is recomended
267267
over `binarycrossentropy` for numerical stability.
268268
@@ -452,7 +452,7 @@ end
452452
binary_focal_loss(ŷ, y; agg=mean, γ=2, ϵ=eps(ŷ))
453453
454454
Return the [binary_focal_loss](https://arxiv.org/pdf/1708.02002.pdf)
455-
The input, 'ŷ', is expected to be normalized (i.e. [`softmax`](@ref) output).
455+
The input, 'ŷ', is expected to be normalized (i.e. [softmax](@ref Softmax) output).
456456
457457
For `γ == 0`, the loss is mathematically equivalent to [`Losses.binarycrossentropy`](@ref).
458458
@@ -493,7 +493,7 @@ end
493493
Return the [focal_loss](https://arxiv.org/pdf/1708.02002.pdf)
494494
which can be used in classification tasks with highly imbalanced classes.
495495
It down-weights well-classified examples and focuses on hard examples.
496-
The input, 'ŷ', is expected to be normalized (i.e. [`softmax`](@ref) output).
496+
The input, 'ŷ', is expected to be normalized (i.e. [softmax](@ref Softmax) output).
497497
498498
The modulating factor, `γ`, controls the down-weighting strength.
499499
For `γ == 0`, the loss is mathematically equivalent to [`Losses.crossentropy`](@ref).

0 commit comments

Comments
 (0)