Skip to content

Commit 4e28377

Browse files
bors[bot]ToucheSir
andauthored
Merge #1781
1781: Fix AlphaDropout implementation and add tests r=CarloLucibello a=ToucheSir AFAICT, the original implementation never behaved as expected even pre-Zygote. This was likely not caught because the original PR didn't come with tests, so this PR should remedy that. Behaviour and outputs are adapted from the PyTorch and TF implementations. Some points of note: 1. We have to special-case `p = 0` to avoid propagating NaNs when calcuating `A` and `B`. 2. Likewise for `p = 0`. TF just returns the input, but I think the PyTorch approach of returning all zeros (+/- depending on the input sign) is more in line with `Dropout`. 3. `ifelse` is used instead of something like https://github.com/keras-team/keras/blob/v2.7.0/keras/layers/noise.py#L200. I think it better reflects the conditional nature of the operation and it was also faster in local benchmarking. ### PR Checklist - [x] Tests are added - [ ] Entry in NEWS.md Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
2 parents 0a1ad37 + e249c5c commit 4e28377

File tree

4 files changed

+47
-20
lines changed

4 files changed

+47
-20
lines changed

NEWS.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Flux Release Notes
22

3+
## v0.12.9
4+
* Fixed incorrect output and added GPU compatibility for [AlphaDropout](https://github.com/FluxML/Flux.jl/pull/1781).
5+
36
## v0.12.8
47
* Optimized inference and gradient calculation of OneHotMatrix[pr](https://github.com/FluxML/Flux.jl/pull/1756)
58

@@ -12,7 +15,7 @@
1215
* REPL printing via [`show`](https://github.com/FluxML/Flux.jl/pull/1467) displays parameter counts.
1316

1417
## v0.12.4
15-
* Implemented an [`Embedding layer`](https://github.com/FluxML/Flux.jl/pull/1516)
18+
* Implemented an [`Embedding layer`](https://github.com/FluxML/Flux.jl/pull/1516)
1619
based on `NNlib.gather` and `NNlib.scatter`.
1720

1821
## v0.12.1 - v0.12.3
@@ -37,8 +40,8 @@
3740
* New [`Parallel` layer](https://github.com/FluxML/Flux.jl/pull/1462) adds inception module-like building blocks.
3841
* Feature additions and bug fixes for BatchNorm, LayerNorm, InstanceNorm, and GroupNorm [normalization layers](https://github.com/FluxML/Flux.jl/pull/1397)
3942
* Added [Upsample and PixelShuffle layers](https://github.com/FluxML/Flux.jl/pull/1468)
40-
* End of deprecation cycle: loss functions cannot be accessed directly from `Flux` anymore, they live in the `Flux.Losses` module.
41-
All loss functions perform `mean` aggregation by default.
43+
* End of deprecation cycle: loss functions cannot be accessed directly from `Flux` anymore, they live in the `Flux.Losses` module.
44+
All loss functions perform `mean` aggregation by default.
4245

4346
## v0.11.2
4447

src/layers/normalise.jl

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -101,17 +101,18 @@ mutable struct AlphaDropout{F}
101101
end
102102
end
103103

104-
function (a::AlphaDropout)(x)
104+
function (a::AlphaDropout)(x::AbstractArray{T}) where T
105105
_isactive(a) || return x
106-
λ = eltype(x)(1.0507009873554804934193349852946)
107-
α = eltype(x)(1.6732632423543772848170429916717)
108-
α1 = eltype(x)(-λ*α)
109-
noise = randn(eltype(x), size(x))
110-
x = @. x*(noise > (1 - a.p)) + α1 * (noise < (1 - a.p))
111-
A = sqrt(a.p + a.p * (1 - a.p) * α1^2)
112-
B = -A * α1 * (1 - a.p)
113-
x = @. A * x + B
114-
return x
106+
p = a.p
107+
iszero(p) && return x
108+
isone(p) && return sign.(x) .* T(0)
109+
110+
α′ = T(-1.7580993408473766) # selu(-Inf) == -λα
111+
A = T(inv(sqrt((1 - p) * (1 + p * α′^2))))
112+
B = T(-A * α′ * p)
113+
114+
noise = rand!(similar(x))
115+
return A .* ifelse.(noise .> p, x, α′) .+ B
115116
end
116117

117118
testmode!(m::AlphaDropout, mode=true) =

test/cuda/layers.jl

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,8 @@
1010
@test gradient(x -> sum(cpu(x)), gpu(rand(3,3))) isa Tuple
1111
end
1212

13-
# TODO: These layers get into scalar indexing
14-
# `AlphaDropout` throws a compilation error on GPUs,
15-
# whereas, the rest are scalar indexing issues.
16-
# The norm layers behave differently on the CPU and
17-
# the GPU too.
18-
const BROKEN_LAYERS = Union{DepthwiseConv,
19-
AlphaDropout}
13+
# TODO: These layers get into scalar indexing issues.
14+
const BROKEN_LAYERS = Union{DepthwiseConv}
2015

2116
const ACTIVATIONS = [identity, relu, tanh,
2217
sigmoid, exp, softplus,

test/layers/normalisation.jl

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,34 @@ evalwgrad(f, x...) = pullback(f, x...)[1]
5757
@test count(a->a == 0, y) == 0
5858
end
5959

60+
@testset "AlphaDropout" begin
61+
x = [1., 2., 3.]
62+
@test x == AlphaDropout(0.1)(x)
63+
@test x == evalwgrad(AlphaDropout(0), x)
64+
@test zero(x) == evalwgrad(AlphaDropout(1), x)
65+
66+
x = randn(1000) # large enough to prevent flaky test
67+
m = AlphaDropout(0.5)
68+
69+
y = evalwgrad(m, x)
70+
# Should preserve unit mean and variance
71+
@test mean(y) 0 atol=0.1
72+
@test var(y) 1 atol=0.1
73+
74+
testmode!(m, true) # should override istraining
75+
@test evalwgrad(m, x) == x
76+
77+
testmode!(m, false)
78+
y = evalwgrad(m, x)
79+
@test mean(y) 0 atol=0.1
80+
@test var(y) 1 atol=0.1
81+
82+
# Known good value ranges
83+
# Values taken from https://github.com/pytorch/pytorch/blob/v1.10.0/test/cpp/api/modules.cpp#L1337-L1338
84+
x = ones(100)
85+
@test 40 < sum(evalwgrad(m, x)) < 130
86+
end
87+
6088
@testset "BatchNorm" begin
6189
let m = BatchNorm(2), x = [1.0 3.0 5.0;
6290
2.0 4.0 6.0]

0 commit comments

Comments
 (0)