-
Couldn't load subscription status.
- Fork 132
Implement DeltaBarDelta using refactored GradientDescent.
#440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
9138f30
398ba31
c8c74ab
b0b2d0a
f1bae5c
143e236
244d969
4ead909
9ef2543
ac44ff6
7b9f7a7
651850a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -823,8 +823,6 @@ parameters. | |
| If `lambda` and `sigma` are not specified, then 0 is used as the initial value | ||
| for all Lagrange multipliers and 10 is used as the initial penalty parameter. | ||
|
|
||
| </details> | ||
|
|
||
| #### Examples | ||
|
|
||
| <details open> | ||
|
|
@@ -1261,6 +1259,62 @@ optimizer.Optimize(f, coordinates); | |
| * [Differential Evolution in Wikipedia](https://en.wikipedia.org/wiki/Differential_Evolution) | ||
| * [Arbitrary functions](#arbitrary-functions) | ||
|
|
||
| ## DeltaBarDelta | ||
|
|
||
| *An optimizer for [differentiable functions](#differentiable-functions).* | ||
|
|
||
| A Gradient Descent variant that adapts learning rates for each parameter to improve convergence. If the current gradient and the exponential average of past gradients corresponding to a parameter have the same sign, then the step size for that parameter is incremented by `kappa`. Otherwise, it is decreased by a proportion `phi` of its current value (additive increase, multiplicative decrease). | ||
|
|
||
| ***Note:*** DeltaBarDelta is very sensitive to its parameters (`kappa` and `phi`) hence a good hyperparameter selection is necessary as its default may not fit every case. Typically, `kappa` should be smaller than the step size. | ||
|
|
||
| #### Constructors | ||
|
|
||
| * `DeltaBarDelta()` | ||
| * `DeltaBarDelta(`_`stepSize`_`)` | ||
| * `DeltaBarDelta(`_`stepSize, maxIterations, tolerance`_`)` | ||
| * `DeltaBarDelta(`_`stepSize, maxIterations, tolerance, kappa, phi, theta, minStepSize, resetPolicy`_`)` | ||
|
|
||
| Note that `DeltaBarDelta` is based on the templated type | ||
| `GradientDescentType<`_`UpdatePolicyType, DecayPolicyType`_`>` with _`UpdatePolicyType`_` = | ||
| `DeltaBarDeltaUpdate` and _`DecayPolicyType`_` = NoDecay`. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because Then the table of options below will be much simplified too. |
||
|
|
||
| #### Attributes | ||
|
|
||
| | **type** | **name** | **description** | **default** | | ||
| |----------|----------|-----------------|-------------| | ||
| | `double` | **`stepSize`** | Initial step size. | `0.01` | | ||
| | `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` | | ||
| | `double` | **`tolerance`** | Maximum absolute tolerance to terminate algorithm. | `1e-5` | | ||
| | `double` | **`kappa`** | Additive increase constant for step size when gradient signs persist. | `0.002` | | ||
| | `double` | **`phi`** | Multiplicative decrease factor for step size when gradient signs flip. | `0.2` | | ||
| | `double` | **`theta`** | Decay rate for computing the exponential average of past gradients. | `0.8` | | ||
| | `double` | **`minStepSize`** | Minimum allowed step size for any parameter. | `1e-8` | | ||
| | `bool` | **`resetPolicy`** | If true, parameters are reset before every Optimize call. | `true` | | ||
|
|
||
| Attributes of the optimizer may be accessed and modified via member functions of the same name. | ||
|
|
||
| #### Examples: | ||
|
|
||
| <details open> | ||
| <summary>Click to collapse/expand example code. | ||
| </summary> | ||
|
|
||
| ```c++ | ||
| RosenbrockFunction f; | ||
| arma::mat coordinates = f.GetInitialPoint(); | ||
|
|
||
| DeltaBarDelta optimizer(0.001, 0, 1e-15, 0.0001, 0.2, 0.8); | ||
| optimizer.Optimize(f, coordinates); | ||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| #### See also: | ||
|
|
||
| * [Increased rates of convergence through learning rate adaptation (pdf)](https://www.academia.edu/download/32005051/Jacobs.NN88.pdf) | ||
| * [Differentiable functions](#differentiable-functions) | ||
| * [Gradient Descent](#gradient-descent) | ||
|
|
||
| ## DemonAdam | ||
|
|
||
| *An optimizer for [differentiable separable functions](#differentiable-separable-functions).* | ||
|
|
@@ -1899,6 +1953,11 @@ negative of the gradient of the function at the current point. | |
| * `GradientDescent()` | ||
| * `GradientDescent(`_`stepSize`_`)` | ||
| * `GradientDescent(`_`stepSize, maxIterations, tolerance`_`)` | ||
| * `GradientDescent(`_`stepSize, maxIterations, tolerance, updatePolicy, decayPolicy, resetPolicy`_`)` | ||
|
|
||
| Note that `GradientDescent` is based on the templated type | ||
| `GradientDescentType<`_`UpdatePolicyType, DecayPolicyType`_`>` with _`UpdatePolicyType`_` = | ||
| VanillaUpdate` and _`DecayPolicyType`_` = NoDecay`. | ||
|
|
||
| #### Attributes | ||
|
|
||
|
|
@@ -1907,9 +1966,14 @@ negative of the gradient of the function at the current point. | |
| | `double` | **`stepSize`** | Step size for each iteration. | `0.01` | | ||
| | `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` | | ||
| | `size_t` | **`tolerance`** | Maximum absolute tolerance to terminate algorithm. | `1e-5` | | ||
| | `UpdatePolicyType` | **`updatePolicy`** | Instantiated update policy used to adjust the given parameters. | `UpdatePolicyType()` | | ||
| | `DecayPolicyType` | **`decayPolicy`** | Instantiated decay policy used to adjust the step size. | `DecayPolicyType()` | | ||
| | `bool` | **`resetPolicy`** | Flag that determines whether update policy parameters are reset before every Optimize call. | `true` | | ||
|
|
||
| Attributes of the optimizer may also be changed via the member methods | ||
| `StepSize()`, `MaxIterations()`, and `Tolerance()`. | ||
| `StepSize()`, `MaxIterations()`, `Tolerance()`, `UpdatePolicy()`, | ||
| `DecayPolicy()`, and `ResetPolicy()`. | ||
|
|
||
|
|
||
| #### Examples: | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,184 @@ | ||
| /** | ||
| * @file delta_bar_delta.hpp | ||
| * @author Ranjodh Singh | ||
| * | ||
| * Class wrapper for the DeltaBarDelta update policy. | ||
| * | ||
| * ensmallen is free software; you may redistribute it and/or modify it under | ||
| * the terms of the 3-clause BSD license. You should have received a copy of | ||
| * the 3-clause BSD license along with ensmallen. If not, see | ||
| * http://www.opensource.org/licenses/BSD-3-Clause for more information. | ||
| */ | ||
| #ifndef ENSMALLEN_DELTA_BAR_DELTA_HPP | ||
| #define ENSMALLEN_DELTA_BAR_DELTA_HPP | ||
|
|
||
| #include <ensmallen_bits/gradient_descent/gradient_descent.hpp> | ||
| #include "./delta_bar_delta_update.hpp" | ||
|
|
||
| namespace ens { | ||
|
|
||
| /** | ||
| * DeltaBarDelta Optimizer. | ||
| * | ||
| * A heuristic designed to accelerate convergence by | ||
| * adapting the learning rate of each parameter individually. | ||
| * | ||
| * According to the Delta-Bar-Delta update: | ||
| * | ||
| * - If the current gradient and the exponential average of | ||
| * past gradients corresponding to a parameter have the same | ||
| * sign, then the step size for that parameter is incremented by | ||
| * \f$\kappa\f$. Otherwise, it is decreased by a proportion \f$\phi\f$ | ||
| * of its current value (additive increase, multiplicative decrease). | ||
| * | ||
| * @note This implementation uses a minStepSize parameter to set a lower | ||
| * bound for the learning rate. This prevents the learning rate from | ||
| * dropping to zero, which can occur due to floating-point underflow. | ||
| * For tasks which require extreme fine-tuning, you may need to lower | ||
| * this parameter below its default value (1e-8) in order to allow for | ||
| * smaller learning rates. | ||
| * | ||
| * @code | ||
| * @article{jacobs1988increased, | ||
| * title = {Increased Rates of Convergence Through Learning Rate | ||
| * Adaptation}, | ||
| * author = {Jacobs, Robert A.}, journal = {Neural Networks}, | ||
| * volume = {1}, | ||
| * number = {4}, | ||
| * pages = {295--307}, | ||
| * year = {1988}, | ||
| * publisher = {Pergamon} | ||
| * } | ||
| * @endcode | ||
| */ | ||
| class DeltaBarDelta | ||
| { | ||
| public: | ||
| /** | ||
| * Construct the DeltaBarDelta optimizer with the given function and | ||
| * parameters. DeltaBarDelta is very sensitive to its parameters (kappa | ||
| * and phi) hence a good hyperparameter selection is necessary as its | ||
| * default may not fit every case. | ||
| * | ||
| * @param stepSize Initial step size. | ||
| * @param maxIterations Maximum number of iterations allowed (0 means no | ||
| * limit). | ||
| * @param tolerance Maximum absolute tolerance to terminate algorithm. | ||
| * @param kappa Constant increment applied when gradient signs persist. | ||
| * @param phi Proportional decrement factor when gradient signs flip. | ||
| * @param theta Decay rate for the exponential average (delta-bar). | ||
| * @param minStepSize Minimum allowed step size for any parameter | ||
| * (default: 1e-8). | ||
| * @param resetPolicy If true, parameters are reset before every Optimize | ||
| * call; otherwise, their values are retained. | ||
| */ | ||
| DeltaBarDelta(const double stepSize = 0.01, | ||
| const size_t maxIterations = 100000, | ||
| const double tolerance = 1e-5, | ||
| const double kappa = 0.002, | ||
| const double phi = 0.2, | ||
| const double theta = 0.8, | ||
| const double minStepSize = 1e-8, | ||
| const bool resetPolicy = true); | ||
|
|
||
| /** | ||
| * Optimize the given function using DeltaBarDelta. | ||
| * The given starting point will be modified to store the finishing | ||
| * point of the algorithm, and the final objective value is returned. | ||
| * | ||
| * @tparam SeparableFunctionType Type of the function to optimize. | ||
| * @tparam MatType Type of matrix to optimize with. | ||
| * @tparam GradType Type of matrix to use to represent function gradients. | ||
| * @tparam CallbackTypes Types of callback functions. | ||
| * @param function Function to optimize. | ||
| * @param iterate Starting point (will be modified). | ||
| * @param callbacks Callback functions. | ||
| * @return Objective value of the final point. | ||
| */ | ||
| template<typename SeparableFunctionType, | ||
| typename MatType, | ||
| typename GradType, | ||
| typename... CallbackTypes> | ||
| typename std::enable_if<IsMatrixType<GradType>::value, | ||
| typename MatType::elem_type>::type | ||
| Optimize(SeparableFunctionType& function, | ||
| MatType& iterate, | ||
| CallbackTypes&&... callbacks) | ||
| { | ||
| return optimizer.Optimize<SeparableFunctionType, MatType, GradType, | ||
| CallbackTypes...>(function, iterate, | ||
| std::forward<CallbackTypes>(callbacks)...); | ||
| } | ||
|
|
||
| //! Forward the MatType as GradType. | ||
| template<typename SeparableFunctionType, | ||
| typename MatType, | ||
| typename... CallbackTypes> | ||
| typename MatType::elem_type Optimize(SeparableFunctionType& function, | ||
| MatType& iterate, | ||
| CallbackTypes&&... callbacks) | ||
| { | ||
| return Optimize<SeparableFunctionType, MatType, MatType, | ||
| CallbackTypes...>(function, iterate, | ||
| std::forward<CallbackTypes>(callbacks)...); | ||
| } | ||
|
|
||
| //! Get the initial step size. | ||
| double StepSize() const { return optimizer.StepSize(); } | ||
| //! Modify the initial step size. | ||
| double& StepSize() { return optimizer.StepSize(); } | ||
|
|
||
| //! Get the maximum number of iterations (0 indicates no limit). | ||
| size_t MaxIterations() const { return optimizer.MaxIterations(); } | ||
| //! Modify the maximum number of iterations (0 indicates no limit). | ||
| size_t& MaxIterations() { return optimizer.MaxIterations(); } | ||
|
|
||
| //! Get the additive increase constant for step size | ||
| //! when gradient signs persist. | ||
| double Kappa() const { return optimizer.UpdatePolicy().Kappa(); } | ||
| //! Modify the additive increase constant for step size | ||
| //! when gradient signs persist. | ||
| double& Kappa() { return optimizer.UpdatePolicy().Kappa(); } | ||
|
|
||
| //! Get the multiplicative decrease factor for step size | ||
| //! when gradient signs flip. | ||
| double Phi() const { return optimizer.UpdatePolicy().Phi(); } | ||
| //! Get the multiplicative decrease factor for step size | ||
| //! when gradient signs flip. | ||
| double& Phi() { return optimizer.UpdatePolicy().Phi(); } | ||
|
|
||
| //! Get the decay rate for computing the exponential average | ||
| //! of past gradients (delta-bar). | ||
| double Theta() const { return optimizer.UpdatePolicy().Theta(); } | ||
| //! Modify the decay rate for computing the exponential average | ||
| //! of past gradients (delta-bar). | ||
| double& Theta() { return optimizer.UpdatePolicy().Theta(); } | ||
|
|
||
| //! Get the minimum allowed step size. | ||
| double MinStepSize() const { return optimizer.UpdatePolicy().MinStepSize(); } | ||
| //! Modify the minimum allowed step size. | ||
| double& MinStepSize() { return optimizer.UpdatePolicy().MinStepSize(); } | ||
|
|
||
| //! Get the tolerance for termination. | ||
| double Tolerance() const { return optimizer.Tolerance(); } | ||
| //! Modify the tolerance for termination. | ||
| double& Tolerance() { return optimizer.Tolerance(); } | ||
|
|
||
| //! Get whether or not the update policy parameters are reset before | ||
| //! Optimize call. | ||
| bool ResetPolicy() const { return optimizer.ResetPolicy(); } | ||
| //! Modify whether or not the update policy parameters | ||
| //! are reset before Optimize call. | ||
| bool& ResetPolicy() { return optimizer.ResetPolicy(); } | ||
|
|
||
| private: | ||
| //! The GradientDescentType object with DeltaBarDelta policy. | ||
| GradientDescentType<DeltaBarDeltaUpdate, NoDecay> optimizer; | ||
| }; | ||
|
|
||
| } // namespace ens | ||
|
|
||
| // Include implementation. | ||
| #include "delta_bar_delta_impl.hpp" | ||
|
|
||
| #endif // ENSMALLEN_DELTA_BAR_DELTA_HPP |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| /** | ||
| * @file delta_bar_delta.hpp | ||
| * @author Ranjodh Singh | ||
| * | ||
| * Implementation of DeltaBarDelta class wrapper. | ||
| * | ||
| * ensmallen is free software; you may redistribute it and/or modify it under | ||
| * the terms of the 3-clause BSD license. You should have received a copy of | ||
| * the 3-clause BSD license along with ensmallen. If not, see | ||
| * http://www.opensource.org/licenses/BSD-3-Clause for more information. | ||
| */ | ||
| #ifndef ENSMALLEN_DELTA_BAR_DELTA_IMPL_HPP | ||
| #define ENSMALLEN_DELTA_BAR_DELTA_IMPL_HPP | ||
|
|
||
| // In case it hasn't been included yet. | ||
| #include "./delta_bar_delta.hpp" | ||
|
|
||
| namespace ens { | ||
|
|
||
| inline DeltaBarDelta::DeltaBarDelta( | ||
| const double stepSize, | ||
| const size_t maxIterations, | ||
| const double tolerance, | ||
| const double kappa, | ||
| const double phi, | ||
| const double theta, | ||
| const double minStepSize, | ||
| const bool resetPolicy) : | ||
| optimizer(stepSize, | ||
| maxIterations, | ||
| tolerance, | ||
| DeltaBarDeltaUpdate(stepSize, kappa, phi, theta, minStepSize), | ||
| NoDecay(), | ||
| resetPolicy) | ||
| { | ||
| /* Nothing to do. */ | ||
| } | ||
|
|
||
| } // namespace ens | ||
|
|
||
| #endif // ENSMALLEN_DELTA_BAR_DELTA_IMPL_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice catch!