Merge pull request #1996 from Karthik-d-k/namecase

ToucheSir · web-flow · commit 952c4a51b6f8 · 2022-06-15T20:09:20.000-07:00
replace ADAM with Adam and its variants thereof
diff --git a/docs/src/models/recurrence.md b/docs/src/models/recurrence.md
@@ -173,7 +173,7 @@ Flux.reset!(m)
 [m(x) for x in seq_init]
 
 ps = Flux.params(m)
-opt= ADAM(1e-3)
+opt= Adam(1e-3)
 Flux.train!(loss, ps, data, opt)
 ```
 
diff --git a/docs/src/saving.md b/docs/src/saving.md
@@ -135,6 +135,6 @@ You can store the optimiser state alongside the model, to resume training
 exactly where you left off. BSON is smart enough to [cache values](https://github.com/JuliaIO/BSON.jl/blob/v0.3.4/src/write.jl#L71) and insert links when saving, but only if it knows everything to be saved up front. Thus models and optimizers must be saved together to have the latter work after restoring.
 
 ```julia
-opt = ADAM()
+opt = Adam()
 @save "model-$(now()).bson" model opt
 ```
diff --git a/docs/src/training/optimisers.md b/docs/src/training/optimisers.md
@@ -39,7 +39,7 @@ for p in (W, b)
 end
 ```
 
-An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `ADAM`.
+An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `Adam`.
 
 ## Optimiser Reference
 
@@ -51,15 +51,15 @@ Descent
 Momentum
 Nesterov
 RMSProp
-ADAM
-RADAM
+Adam
+RAdam
 AdaMax
-ADAGrad
-ADADelta
+AdaGrad
+AdaDelta
 AMSGrad
-NADAM
-ADAMW
-OADAM
+NAdam
+AdamW
+OAdam
 AdaBelief
 ```
 
@@ -182,7 +182,7 @@ WeightDecay
 Gradient clipping is useful for training recurrent neural networks, which have a tendency to suffer from the exploding gradient problem. An example usage is
 
 ```julia
-opt = Optimiser(ClipValue(1e-3), ADAM(1e-3))
+opt = Optimiser(ClipValue(1e-3), Adam(1e-3))
 ```
 
 ```@docs
diff --git a/src/Flux.jl b/src/Flux.jl
@@ -29,9 +29,9 @@ include("optimise/Optimise.jl")
 using .Optimise
 using .Optimise: @epochs
 using .Optimise: skip
-export Descent, ADAM, Momentum, Nesterov, RMSProp,
-  ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM, OADAM,
-  ADAMW, RADAM, AdaBelief, InvDecay, ExpDecay,
+export Descent, Adam, Momentum, Nesterov, RMSProp,
+  AdaGrad, AdaMax, AdaDelta, AMSGrad, NAdam, OAdam,
+  AdamW, RAdam, AdaBelief, InvDecay, ExpDecay,
   WeightDecay, ClipValue, ClipNorm
 
 using CUDA
diff --git a/src/deprecations.jl b/src/deprecations.jl
@@ -71,3 +71,12 @@ LSTMCell(in::Integer, out::Integer; kw...) = LSTMCell(in => out; kw...)
 
 GRUCell(in::Integer, out::Integer; kw...) = GRUCell(in => out; kw...)
 GRUv3Cell(in::Integer, out::Integer; kw...) = GRUv3Cell(in => out; kw...)
+
+# Optimisers with old naming convention
+Base.@deprecate_binding ADAM Adam
+Base.@deprecate_binding NADAM NAdam
+Base.@deprecate_binding ADAMW AdamW
+Base.@deprecate_binding RADAM RAdam
+Base.@deprecate_binding OADAM OAdam
+Base.@deprecate_binding ADAGrad AdaGrad
+Base.@deprecate_binding ADADelta AdaDelta
diff --git a/src/optimise/Optimise.jl b/src/optimise/Optimise.jl
@@ -4,8 +4,8 @@ using LinearAlgebra
 import ArrayInterface
 
 export train!, update!,
-	Descent, ADAM, Momentum, Nesterov, RMSProp,
-	ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM, ADAMW,RADAM, OADAM, AdaBelief,
+	Descent, Adam, Momentum, Nesterov, RMSProp,
+	AdaGrad, AdaMax, AdaDelta, AMSGrad, NAdam, AdamW,RAdam, OAdam, AdaBelief,
 	InvDecay, ExpDecay, WeightDecay, stop, skip, Optimiser,
 	ClipValue, ClipNorm
 
diff --git a/src/optimise/optimisers.jl b/src/optimise/optimisers.jl
diff --git a/test/optimise.jl b/test/optimise.jl