Skip to content
Open
7 changes: 7 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
### ensmallen ?.??.?: "???"
###### ????-??-??
* Refactor `GradientDescent` into
`GradientDescentType<UpdatePolicyType, DecayPolicyType>`.
Add the `DeltaBarDelta` optimizer, which implements Jacob's delta-bar-delta
update through `GradientDescentType` with `DeltaBarDeltaUpdate` and `NoDecay`
policies. ([#440](https://github.com/mlpack/ensmallen/pull/440))
See the documentation for more details.

### ensmallen 3.10.0: "Unexpected Rain"
###### 2025-09-25
Expand Down Expand Up @@ -44,6 +50,7 @@
ActiveCMAES<FullSelection, BoundaryBoxConstraint> opt(lambda,
BoundaryBoxConstraint(lowerBound, upperBound), ...);
```

* Add proximal gradient optimizers for L1-constrained and other related
problems: `FBS`, `FISTA`, and `FASTA`
([#427](https://github.com/mlpack/ensmallen/pull/427)). See the
Expand Down
1 change: 1 addition & 0 deletions doc/function_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ The following optimizers can be used with differentiable functions:
* [Fast Adaptive Shrinkage/Thresholding Algorithm (FASTA)](#fast-adaptive-shrinkage-thresholding-algorithm-fasta) (`ens::FASTA`)
* [FrankWolfe](#frank-wolfe) (`ens::FrankWolfe`)
* [GradientDescent](#gradient-descent) (`ens::GradientDescent`)
* [DeltaBarDelta](#delta-bar-delta) (`ens::DeltaBarDelta`)
- Any optimizer for [arbitrary functions](#arbitrary-functions)

Each of these optimizers has an `Optimize()` function that is called as
Expand Down
70 changes: 67 additions & 3 deletions doc/optimizers.md
Original file line number Diff line number Diff line change
Expand Up @@ -823,8 +823,6 @@ parameters.
If `lambda` and `sigma` are not specified, then 0 is used as the initial value
for all Lagrange multipliers and 10 is used as the initial penalty parameter.

</details>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice catch!

#### Examples

<details open>
Expand Down Expand Up @@ -1261,6 +1259,62 @@ optimizer.Optimize(f, coordinates);
* [Differential Evolution in Wikipedia](https://en.wikipedia.org/wiki/Differential_Evolution)
* [Arbitrary functions](#arbitrary-functions)

## DeltaBarDelta

*An optimizer for [differentiable functions](#differentiable-functions).*

A Gradient Descent variant that adapts learning rates for each parameter to improve convergence. If the current gradient and the exponential average of past gradients corresponding to a parameter have the same sign, then the step size for that parameter is incremented by `kappa`. Otherwise, it is decreased by a proportion `phi` of its current value (additive increase, multiplicative decrease).

***Note:*** DeltaBarDelta is very sensitive to its parameters (`kappa` and `phi`) hence a good hyperparameter selection is necessary as its default may not fit every case. Typically, `kappa` should be smaller than the step size.

#### Constructors

* `DeltaBarDelta()`
* `DeltaBarDelta(`_`stepSize`_`)`
* `DeltaBarDelta(`_`stepSize, maxIterations, tolerance`_`)`
* `DeltaBarDelta(`_`stepSize, maxIterations, tolerance, kappa, phi, theta, minStepSize, resetPolicy`_`)`

Note that `DeltaBarDelta` is based on the templated type
`GradientDescentType<`_`UpdatePolicyType, DecayPolicyType`_`>` with _`UpdatePolicyType`_` =
`DeltaBarDeltaUpdate` and _`DecayPolicyType`_` = NoDecay`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because DeltaBarDelta always uses the DeltaBarDeltaUpdate class, it doesn't really make sense to the user to provide a constructor that takes updatePolicy, decayPolicy, and resetPolicy. Of course an advanced user can do this if they take a look at the internals of the GradientDescent class. But for the typical user, they just want to set the settings of DeltaBarDelta and move on. So I would suggest adding a wrapper class DeltaBarDelta that just calls GradientDescentType<DeltaBarDeltaUpdate, NoDecay> internally (there are many of these for SGD variants), and also provides a constructor that forwards the parameters of DeltaBarDeltaUpdate:

DeltaBarDelta(stepSize, maxIterations, tolerance, kappa, phi, momentum, minGain, resetPolicy)

Then the table of options below will be much simplified too.


#### Attributes

| **type** | **name** | **description** | **default** |
|----------|----------|-----------------|-------------|
| `double` | **`stepSize`** | Initial step size. | `0.01` |
| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
| `double` | **`tolerance`** | Maximum absolute tolerance to terminate algorithm. | `1e-5` |
| `double` | **`kappa`** | Additive increase constant for step size when gradient signs persist. | `0.002` |
| `double` | **`phi`** | Multiplicative decrease factor for step size when gradient signs flip. | `0.2` |
| `double` | **`theta`** | Decay rate for computing the exponential average of past gradients. | `0.8` |
| `double` | **`minStepSize`** | Minimum allowed step size for any parameter. | `1e-8` |
| `bool` | **`resetPolicy`** | If true, parameters are reset before every Optimize call. | `true` |

Attributes of the optimizer may be accessed and modified via member functions of the same name.

#### Examples:

<details open>
<summary>Click to collapse/expand example code.
</summary>

```c++
RosenbrockFunction f;
arma::mat coordinates = f.GetInitialPoint();

DeltaBarDelta optimizer(0.001, 0, 1e-15, 0.0001, 0.2, 0.8);
optimizer.Optimize(f, coordinates);
```

</details>

#### See also:

* [Increased rates of convergence through learning rate adaptation (pdf)](https://www.academia.edu/download/32005051/Jacobs.NN88.pdf)
* [Differentiable functions](#differentiable-functions)
* [Gradient Descent](#gradient-descent)

## DemonAdam

*An optimizer for [differentiable separable functions](#differentiable-separable-functions).*
Expand Down Expand Up @@ -1899,6 +1953,11 @@ negative of the gradient of the function at the current point.
* `GradientDescent()`
* `GradientDescent(`_`stepSize`_`)`
* `GradientDescent(`_`stepSize, maxIterations, tolerance`_`)`
* `GradientDescent(`_`stepSize, maxIterations, tolerance, updatePolicy, decayPolicy, resetPolicy`_`)`

Note that `GradientDescent` is based on the templated type
`GradientDescentType<`_`UpdatePolicyType, DecayPolicyType`_`>` with _`UpdatePolicyType`_` =
VanillaUpdate` and _`DecayPolicyType`_` = NoDecay`.

#### Attributes

Expand All @@ -1907,9 +1966,14 @@ negative of the gradient of the function at the current point.
| `double` | **`stepSize`** | Step size for each iteration. | `0.01` |
| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
| `size_t` | **`tolerance`** | Maximum absolute tolerance to terminate algorithm. | `1e-5` |
| `UpdatePolicyType` | **`updatePolicy`** | Instantiated update policy used to adjust the given parameters. | `UpdatePolicyType()` |
| `DecayPolicyType` | **`decayPolicy`** | Instantiated decay policy used to adjust the step size. | `DecayPolicyType()` |
| `bool` | **`resetPolicy`** | Flag that determines whether update policy parameters are reset before every Optimize call. | `true` |

Attributes of the optimizer may also be changed via the member methods
`StepSize()`, `MaxIterations()`, and `Tolerance()`.
`StepSize()`, `MaxIterations()`, `Tolerance()`, `UpdatePolicy()`,
`DecayPolicy()`, and `ResetPolicy()`.


#### Examples:

Expand Down
1 change: 1 addition & 0 deletions include/ensmallen.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@
#include "ensmallen_bits/cd/cd.hpp"
#include "ensmallen_bits/cne/cne.hpp"
#include "ensmallen_bits/de/de.hpp"
#include "ensmallen_bits/delta_bar_delta/delta_bar_delta.hpp"
#include "ensmallen_bits/eve/eve.hpp"
#include "ensmallen_bits/fasta/fasta.hpp"
#include "ensmallen_bits/fbs/fbs.hpp"
Expand Down
184 changes: 184 additions & 0 deletions include/ensmallen_bits/delta_bar_delta/delta_bar_delta.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
/**
* @file delta_bar_delta.hpp
* @author Ranjodh Singh
*
* Class wrapper for the DeltaBarDelta update policy.
*
* ensmallen is free software; you may redistribute it and/or modify it under
* the terms of the 3-clause BSD license. You should have received a copy of
* the 3-clause BSD license along with ensmallen. If not, see
* http://www.opensource.org/licenses/BSD-3-Clause for more information.
*/
#ifndef ENSMALLEN_DELTA_BAR_DELTA_HPP
#define ENSMALLEN_DELTA_BAR_DELTA_HPP

#include <ensmallen_bits/gradient_descent/gradient_descent.hpp>
#include "./delta_bar_delta_update.hpp"

namespace ens {

/**
* DeltaBarDelta Optimizer.
*
* A heuristic designed to accelerate convergence by
* adapting the learning rate of each parameter individually.
*
* According to the Delta-Bar-Delta update:
*
* - If the current gradient and the exponential average of
* past gradients corresponding to a parameter have the same
* sign, then the step size for that parameter is incremented by
* \f$\kappa\f$. Otherwise, it is decreased by a proportion \f$\phi\f$
* of its current value (additive increase, multiplicative decrease).
*
* @note This implementation uses a minStepSize parameter to set a lower
* bound for the learning rate. This prevents the learning rate from
* dropping to zero, which can occur due to floating-point underflow.
* For tasks which require extreme fine-tuning, you may need to lower
* this parameter below its default value (1e-8) in order to allow for
* smaller learning rates.
*
* @code
* @article{jacobs1988increased,
* title = {Increased Rates of Convergence Through Learning Rate
* Adaptation},
* author = {Jacobs, Robert A.}, journal = {Neural Networks},
* volume = {1},
* number = {4},
* pages = {295--307},
* year = {1988},
* publisher = {Pergamon}
* }
* @endcode
*/
class DeltaBarDelta
{
public:
/**
* Construct the DeltaBarDelta optimizer with the given function and
* parameters. DeltaBarDelta is very sensitive to its parameters (kappa
* and phi) hence a good hyperparameter selection is necessary as its
* default may not fit every case.
*
* @param stepSize Initial step size.
* @param maxIterations Maximum number of iterations allowed (0 means no
* limit).
* @param tolerance Maximum absolute tolerance to terminate algorithm.
* @param kappa Constant increment applied when gradient signs persist.
* @param phi Proportional decrement factor when gradient signs flip.
* @param theta Decay rate for the exponential average (delta-bar).
* @param minStepSize Minimum allowed step size for any parameter
* (default: 1e-8).
* @param resetPolicy If true, parameters are reset before every Optimize
* call; otherwise, their values are retained.
*/
DeltaBarDelta(const double stepSize = 0.01,
const size_t maxIterations = 100000,
const double tolerance = 1e-5,
const double kappa = 0.002,
const double phi = 0.2,
const double theta = 0.8,
const double minStepSize = 1e-8,
const bool resetPolicy = true);

/**
* Optimize the given function using DeltaBarDelta.
* The given starting point will be modified to store the finishing
* point of the algorithm, and the final objective value is returned.
*
* @tparam SeparableFunctionType Type of the function to optimize.
* @tparam MatType Type of matrix to optimize with.
* @tparam GradType Type of matrix to use to represent function gradients.
* @tparam CallbackTypes Types of callback functions.
* @param function Function to optimize.
* @param iterate Starting point (will be modified).
* @param callbacks Callback functions.
* @return Objective value of the final point.
*/
template<typename SeparableFunctionType,
typename MatType,
typename GradType,
typename... CallbackTypes>
typename std::enable_if<IsMatrixType<GradType>::value,
typename MatType::elem_type>::type
Optimize(SeparableFunctionType& function,
MatType& iterate,
CallbackTypes&&... callbacks)
{
return optimizer.Optimize<SeparableFunctionType, MatType, GradType,
CallbackTypes...>(function, iterate,
std::forward<CallbackTypes>(callbacks)...);
}

//! Forward the MatType as GradType.
template<typename SeparableFunctionType,
typename MatType,
typename... CallbackTypes>
typename MatType::elem_type Optimize(SeparableFunctionType& function,
MatType& iterate,
CallbackTypes&&... callbacks)
{
return Optimize<SeparableFunctionType, MatType, MatType,
CallbackTypes...>(function, iterate,
std::forward<CallbackTypes>(callbacks)...);
}

//! Get the initial step size.
double StepSize() const { return optimizer.StepSize(); }
//! Modify the initial step size.
double& StepSize() { return optimizer.StepSize(); }

//! Get the maximum number of iterations (0 indicates no limit).
size_t MaxIterations() const { return optimizer.MaxIterations(); }
//! Modify the maximum number of iterations (0 indicates no limit).
size_t& MaxIterations() { return optimizer.MaxIterations(); }

//! Get the additive increase constant for step size
//! when gradient signs persist.
double Kappa() const { return optimizer.UpdatePolicy().Kappa(); }
//! Modify the additive increase constant for step size
//! when gradient signs persist.
double& Kappa() { return optimizer.UpdatePolicy().Kappa(); }

//! Get the multiplicative decrease factor for step size
//! when gradient signs flip.
double Phi() const { return optimizer.UpdatePolicy().Phi(); }
//! Get the multiplicative decrease factor for step size
//! when gradient signs flip.
double& Phi() { return optimizer.UpdatePolicy().Phi(); }

//! Get the decay rate for computing the exponential average
//! of past gradients (delta-bar).
double Theta() const { return optimizer.UpdatePolicy().Theta(); }
//! Modify the decay rate for computing the exponential average
//! of past gradients (delta-bar).
double& Theta() { return optimizer.UpdatePolicy().Theta(); }

//! Get the minimum allowed step size.
double MinStepSize() const { return optimizer.UpdatePolicy().MinStepSize(); }
//! Modify the minimum allowed step size.
double& MinStepSize() { return optimizer.UpdatePolicy().MinStepSize(); }

//! Get the tolerance for termination.
double Tolerance() const { return optimizer.Tolerance(); }
//! Modify the tolerance for termination.
double& Tolerance() { return optimizer.Tolerance(); }

//! Get whether or not the update policy parameters are reset before
//! Optimize call.
bool ResetPolicy() const { return optimizer.ResetPolicy(); }
//! Modify whether or not the update policy parameters
//! are reset before Optimize call.
bool& ResetPolicy() { return optimizer.ResetPolicy(); }

private:
//! The GradientDescentType object with DeltaBarDelta policy.
GradientDescentType<DeltaBarDeltaUpdate, NoDecay> optimizer;
};

} // namespace ens

// Include implementation.
#include "delta_bar_delta_impl.hpp"

#endif // ENSMALLEN_DELTA_BAR_DELTA_HPP
41 changes: 41 additions & 0 deletions include/ensmallen_bits/delta_bar_delta/delta_bar_delta_impl.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/**
* @file delta_bar_delta.hpp
* @author Ranjodh Singh
*
* Implementation of DeltaBarDelta class wrapper.
*
* ensmallen is free software; you may redistribute it and/or modify it under
* the terms of the 3-clause BSD license. You should have received a copy of
* the 3-clause BSD license along with ensmallen. If not, see
* http://www.opensource.org/licenses/BSD-3-Clause for more information.
*/
#ifndef ENSMALLEN_DELTA_BAR_DELTA_IMPL_HPP
#define ENSMALLEN_DELTA_BAR_DELTA_IMPL_HPP

// In case it hasn't been included yet.
#include "./delta_bar_delta.hpp"

namespace ens {

inline DeltaBarDelta::DeltaBarDelta(
const double stepSize,
const size_t maxIterations,
const double tolerance,
const double kappa,
const double phi,
const double theta,
const double minStepSize,
const bool resetPolicy) :
optimizer(stepSize,
maxIterations,
tolerance,
DeltaBarDeltaUpdate(stepSize, kappa, phi, theta, minStepSize),
NoDecay(),
resetPolicy)
{
/* Nothing to do. */
}

} // namespace ens

#endif // ENSMALLEN_DELTA_BAR_DELTA_IMPL_HPP
Loading