mlpack · ranjodhsingh1729 · Sep 20, 2025 · Sep 20, 2025 · Sep 20, 2025 · Sep 20, 2025
diff --git a/HISTORY.md b/HISTORY.md
@@ -1,5 +1,11 @@
 ### ensmallen ?.??.?: "???"
 ###### ????-??-??
+* Refactor `GradientDescent` into
+  `GradientDescentType<UpdatePolicyType, DecayPolicyType>`.
+  Add the `DeltaBarDelta` optimizer, which implements Jacob's delta-bar-delta
+  update through `GradientDescentType` with `DeltaBarDeltaUpdate` and `NoDecay`
+  policies. ([#440](https://github.com/mlpack/ensmallen/pull/440))
+  See the documentation for more details.
 
 ### ensmallen 3.10.0: "Unexpected Rain"
 ###### 2025-09-25
@@ -44,6 +50,7 @@
     ActiveCMAES<FullSelection, BoundaryBoxConstraint> opt(lambda,
         BoundaryBoxConstraint(lowerBound, upperBound), ...);
     ```
+
  * Add proximal gradient optimizers for L1-constrained and other related
    problems: `FBS`, `FISTA`, and `FASTA`
    ([#427](https://github.com/mlpack/ensmallen/pull/427)).  See the

diff --git a/doc/function_types.md b/doc/function_types.md
@@ -135,6 +135,7 @@ The following optimizers can be used with differentiable functions:
  * [Fast Adaptive Shrinkage/Thresholding Algorithm (FASTA)](#fast-adaptive-shrinkage-thresholding-algorithm-fasta) (`ens::FASTA`)
  * [FrankWolfe](#frank-wolfe) (`ens::FrankWolfe`)
  * [GradientDescent](#gradient-descent) (`ens::GradientDescent`)
+ * [DeltaBarDelta](#delta-bar-delta) (`ens::DeltaBarDelta`)
  - Any optimizer for [arbitrary functions](#arbitrary-functions)
 
 Each of these optimizers has an `Optimize()` function that is called as

diff --git a/doc/optimizers.md b/doc/optimizers.md
@@ -823,8 +823,6 @@ parameters.
 If `lambda` and `sigma` are not specified, then 0 is used as the initial value
 for all Lagrange multipliers and 10 is used as the initial penalty parameter.
 
-</details>
-
 #### Examples
 
 <details open>
@@ -1261,6 +1259,62 @@ optimizer.Optimize(f, coordinates);
  * [Differential Evolution in Wikipedia](https://en.wikipedia.org/wiki/Differential_Evolution)
  * [Arbitrary functions](#arbitrary-functions)
 
+## DeltaBarDelta
+
+*An optimizer for [differentiable functions](#differentiable-functions).*
+
+A Gradient Descent variant that adapts learning rates for each parameter to improve convergence. If the current gradient and the exponential average of past gradients corresponding to a parameter have the same sign, then the step size for that parameter is incremented by `kappa`. Otherwise, it is decreased by a proportion `phi` of its current value (additive increase, multiplicative decrease).
+
+***Note:*** DeltaBarDelta is very sensitive to its parameters (`kappa` and `phi`) hence a good hyperparameter selection is necessary as its default may not fit every case. Typically, `kappa` should be smaller than the step size.
+
+#### Constructors
+
+ * `DeltaBarDelta()`
+ * `DeltaBarDelta(`_`stepSize`_`)`
+ * `DeltaBarDelta(`_`stepSize, maxIterations, tolerance`_`)`
+ * `DeltaBarDelta(`_`stepSize, maxIterations, tolerance, kappa, phi, theta, minStepSize, resetPolicy`_`)`
+
+Note that `DeltaBarDelta` is based on the templated type
+`GradientDescentType<`_`UpdatePolicyType, DecayPolicyType`_`>` with _`UpdatePolicyType`_` =
+`DeltaBarDeltaUpdate` and _`DecayPolicyType`_` = NoDecay`.
+
+#### Attributes
+
+| **type** | **name** | **description** | **default** |
+|----------|----------|-----------------|-------------|
+| `double` | **`stepSize`** | Initial step size. | `0.01` |
+| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
+| `double` | **`tolerance`**  | Maximum absolute tolerance to terminate algorithm. | `1e-5` |
+| `double` | **`kappa`** | Additive increase constant for step size when gradient signs persist. | `0.002` |
+| `double` | **`phi`** | Multiplicative decrease factor for step size when gradient signs flip. | `0.2` |
+| `double` | **`theta`** | Decay rate for computing the exponential average of past gradients. | `0.8` |
+| `double` | **`minStepSize`** | Minimum allowed step size for any parameter. | `1e-8` |
+| `bool` | **`resetPolicy`** | If true, parameters are reset before every Optimize call. | `true` |
+
+Attributes of the optimizer may be accessed and modified via member functions of the same name.
+
+#### Examples:
+
+<details open>
+<summary>Click to collapse/expand example code.
+</summary>
+
+```c++
+RosenbrockFunction f;
+arma::mat coordinates = f.GetInitialPoint();
+
+DeltaBarDelta optimizer(0.001, 0, 1e-15, 0.0001, 0.2, 0.8);
+optimizer.Optimize(f, coordinates);
+```
+
+</details>
+
+#### See also:
+
+ * [Increased rates of convergence through learning rate adaptation (pdf)](https://www.academia.edu/download/32005051/Jacobs.NN88.pdf)
+ * [Differentiable functions](#differentiable-functions)
+ * [Gradient Descent](#gradient-descent)
+
 ## DemonAdam
 
 *An optimizer for [differentiable separable functions](#differentiable-separable-functions).*
@@ -1899,6 +1953,11 @@ negative of the gradient of the function at the current point.
  * `GradientDescent()`
  * `GradientDescent(`_`stepSize`_`)`
  * `GradientDescent(`_`stepSize, maxIterations, tolerance`_`)`
+ * `GradientDescent(`_`stepSize, maxIterations, tolerance, updatePolicy, decayPolicy, resetPolicy`_`)`
+
+Note that `GradientDescent` is based on the templated type
+`GradientDescentType<`_`UpdatePolicyType, DecayPolicyType`_`>` with _`UpdatePolicyType`_` =
+VanillaUpdate` and _`DecayPolicyType`_` = NoDecay`.
 
 #### Attributes
 
@@ -1907,9 +1966,14 @@ negative of the gradient of the function at the current point.
 | `double` | **`stepSize`** | Step size for each iteration. | `0.01` |
 | `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
 | `size_t` | **`tolerance`**  | Maximum absolute tolerance to terminate algorithm. | `1e-5` |
+| `UpdatePolicyType` | **`updatePolicy`** | Instantiated update policy used to adjust the given parameters. | `UpdatePolicyType()` |
+| `DecayPolicyType` | **`decayPolicy`** | Instantiated decay policy used to adjust the step size. | `DecayPolicyType()` |
+| `bool` | **`resetPolicy`** | Flag that determines whether update policy parameters are reset before every Optimize call. | `true` |
 
 Attributes of the optimizer may also be changed via the member methods
-`StepSize()`, `MaxIterations()`, and `Tolerance()`.
+`StepSize()`, `MaxIterations()`, `Tolerance()`, `UpdatePolicy()`,
+`DecayPolicy()`, and `ResetPolicy()`.
+
 
 #### Examples:
 

diff --git a/include/ensmallen.hpp b/include/ensmallen.hpp
@@ -120,6 +120,7 @@
 #include "ensmallen_bits/cd/cd.hpp"
 #include "ensmallen_bits/cne/cne.hpp"
 #include "ensmallen_bits/de/de.hpp"
+#include "ensmallen_bits/delta_bar_delta/delta_bar_delta.hpp"
 #include "ensmallen_bits/eve/eve.hpp"
 #include "ensmallen_bits/fasta/fasta.hpp"
 #include "ensmallen_bits/fbs/fbs.hpp"

diff --git a/include/ensmallen_bits/delta_bar_delta/delta_bar_delta.hpp b/include/ensmallen_bits/delta_bar_delta/delta_bar_delta.hpp
@@ -0,0 +1,184 @@
+/**
+ * @file delta_bar_delta.hpp
+ * @author Ranjodh Singh
+ *
+ * Class wrapper for the DeltaBarDelta update policy.
+ *
+ * ensmallen is free software; you may redistribute it and/or modify it under
+ * the terms of the 3-clause BSD license.  You should have received a copy of
+ * the 3-clause BSD license along with ensmallen.  If not, see
+ * http://www.opensource.org/licenses/BSD-3-Clause for more information.
+ */
+#ifndef ENSMALLEN_DELTA_BAR_DELTA_HPP
+#define ENSMALLEN_DELTA_BAR_DELTA_HPP
+
+#include <ensmallen_bits/gradient_descent/gradient_descent.hpp>
+#include "./delta_bar_delta_update.hpp"
+
+namespace ens {
+
+/**
+ * DeltaBarDelta Optimizer.
+ *
+ * A heuristic designed to accelerate convergence by
+ * adapting the learning rate of each parameter individually.
+ *
+ * According to the Delta-Bar-Delta update:
+ *
+ * - If the current gradient and the exponential average of
+ *   past gradients corresponding to a parameter have the same
+ *   sign, then the step size for that parameter is incremented by
+ *   \f$\kappa\f$. Otherwise, it is decreased by a proportion \f$\phi\f$
+ *   of its current value (additive increase, multiplicative decrease).
+ *
+ * @note This implementation uses a minStepSize parameter to set a lower
+ *     bound for the learning rate. This prevents the learning rate from
+ *     dropping to zero, which can occur due to floating-point underflow.
+ *     For tasks which require extreme fine-tuning, you may need to lower
+ *     this parameter below its default value (1e-8) in order to allow for
+ *     smaller learning rates.
+ *
+ * @code
+ * @article{jacobs1988increased,
+ *   title     = {Increased Rates of Convergence Through Learning Rate
+ *                Adaptation},
+ *   author    = {Jacobs, Robert A.}, journal = {Neural Networks},
+ *   volume    = {1},
+ *   number    = {4},
+ *   pages     = {295--307},
+ *   year      = {1988},
+ *   publisher = {Pergamon}
+ * }
+ * @endcode
+ */
+class DeltaBarDelta
+{
+ public:
+  /**
+   * Construct the DeltaBarDelta optimizer with the given function and
+   * parameters. DeltaBarDelta is very sensitive to its parameters (kappa
+   * and phi) hence a good hyperparameter selection is necessary as its
+   * default may not fit every case.
+   *
+   * @param stepSize Initial step size.
+   * @param maxIterations Maximum number of iterations allowed (0 means no
+   *     limit).
+   * @param tolerance Maximum absolute tolerance to terminate algorithm.
+   * @param kappa Constant increment applied when gradient signs persist.
+   * @param phi Proportional decrement factor when gradient signs flip.
+   * @param theta Decay rate for the exponential average (delta-bar).
+   * @param minStepSize Minimum allowed step size for any parameter
+   *     (default: 1e-8).
+   * @param resetPolicy If true, parameters are reset before every Optimize
+   *     call; otherwise, their values are retained.
+   */
+  DeltaBarDelta(const double stepSize = 0.01,
+                const size_t maxIterations = 100000,
+                const double tolerance = 1e-5,
+                const double kappa = 0.002,
+                const double phi = 0.2,
+                const double theta = 0.8,
+                const double minStepSize = 1e-8,
+                const bool resetPolicy = true);
+
+  /**
+   * Optimize the given function using DeltaBarDelta.
+   * The given starting point will be modified to store the finishing
+   * point of the algorithm, and the final objective value is returned.
+   *
+   * @tparam SeparableFunctionType Type of the function to optimize.
+   * @tparam MatType Type of matrix to optimize with.
+   * @tparam GradType Type of matrix to use to represent function gradients.
+   * @tparam CallbackTypes Types of callback functions.
+   * @param function Function to optimize.
+   * @param iterate Starting point (will be modified).
+   * @param callbacks Callback functions.
+   * @return Objective value of the final point.
+   */
+  template<typename SeparableFunctionType,
+           typename MatType,
+           typename GradType,
+           typename... CallbackTypes>
+  typename std::enable_if<IsMatrixType<GradType>::value,
+      typename MatType::elem_type>::type
+  Optimize(SeparableFunctionType& function,
+           MatType& iterate,
+           CallbackTypes&&... callbacks)
+  {
+    return optimizer.Optimize<SeparableFunctionType, MatType, GradType,
+        CallbackTypes...>(function, iterate,
+        std::forward<CallbackTypes>(callbacks)...);
+  }
+
+  //! Forward the MatType as GradType.
+  template<typename SeparableFunctionType,
+           typename MatType,
+           typename... CallbackTypes>
+  typename MatType::elem_type Optimize(SeparableFunctionType& function,
+                                       MatType& iterate,
+                                       CallbackTypes&&... callbacks)
+  {
+    return Optimize<SeparableFunctionType, MatType, MatType,
+        CallbackTypes...>(function, iterate,
+        std::forward<CallbackTypes>(callbacks)...);
+  }
+
+  //! Get the initial step size.
+  double StepSize() const { return optimizer.StepSize(); }
+  //! Modify the initial step size.
+  double& StepSize() { return optimizer.StepSize(); }
+
+  //! Get the maximum number of iterations (0 indicates no limit).
+  size_t MaxIterations() const { return optimizer.MaxIterations(); }
+  //! Modify the maximum number of iterations (0 indicates no limit).
+  size_t& MaxIterations() { return optimizer.MaxIterations(); }
+
+  //! Get the additive increase constant for step size
+  //! when gradient signs persist.
+  double Kappa() const { return optimizer.UpdatePolicy().Kappa(); }
+  //! Modify the additive increase constant for step size
+  //! when gradient signs persist.
+  double& Kappa() { return optimizer.UpdatePolicy().Kappa(); }
+
+  //! Get the multiplicative decrease factor for step size
+  //! when gradient signs flip.
+  double Phi() const { return optimizer.UpdatePolicy().Phi(); }
+  //! Get the multiplicative decrease factor for step size
+  //! when gradient signs flip.
+  double& Phi() { return optimizer.UpdatePolicy().Phi(); }
+
+  //! Get the decay rate for computing the exponential average
+  //! of past gradients (delta-bar).
+  double Theta() const { return optimizer.UpdatePolicy().Theta(); }
+  //! Modify the decay rate for computing the exponential average
+  //! of past gradients (delta-bar).
+  double& Theta() { return optimizer.UpdatePolicy().Theta(); }
+
+  //! Get the minimum allowed step size.
+  double MinStepSize() const { return optimizer.UpdatePolicy().MinStepSize(); }
+  //! Modify the minimum allowed step size.
+  double& MinStepSize() { return optimizer.UpdatePolicy().MinStepSize(); }
+
+  //! Get the tolerance for termination.
+  double Tolerance() const { return optimizer.Tolerance(); }
+  //! Modify the tolerance for termination.
+  double& Tolerance() { return optimizer.Tolerance(); }
+
+  //! Get whether or not the update policy parameters are reset before
+  //! Optimize call.
+  bool ResetPolicy() const { return optimizer.ResetPolicy(); }
+  //! Modify whether or not the update policy parameters
+  //! are reset before Optimize call.
+  bool& ResetPolicy() { return optimizer.ResetPolicy(); }
+
+ private:
+  //! The GradientDescentType object with DeltaBarDelta policy.
+  GradientDescentType<DeltaBarDeltaUpdate, NoDecay> optimizer;
+};
+
+} // namespace ens
+
+// Include implementation.
+#include "delta_bar_delta_impl.hpp"
+
+#endif // ENSMALLEN_DELTA_BAR_DELTA_HPP
diff --git a/include/ensmallen_bits/delta_bar_delta/delta_bar_delta_impl.hpp b/include/ensmallen_bits/delta_bar_delta/delta_bar_delta_impl.hpp
@@ -0,0 +1,41 @@
+/**
+ * @file delta_bar_delta.hpp
+ * @author Ranjodh Singh
+ *
+ * Implementation of DeltaBarDelta class wrapper.
+ *
+ * ensmallen is free software; you may redistribute it and/or modify it under
+ * the terms of the 3-clause BSD license.  You should have received a copy of
+ * the 3-clause BSD license along with ensmallen.  If not, see
+ * http://www.opensource.org/licenses/BSD-3-Clause for more information.
+ */
+#ifndef ENSMALLEN_DELTA_BAR_DELTA_IMPL_HPP
+#define ENSMALLEN_DELTA_BAR_DELTA_IMPL_HPP
+
+// In case it hasn't been included yet.
+#include "./delta_bar_delta.hpp"
+
+namespace ens {
+
+inline DeltaBarDelta::DeltaBarDelta(
+    const double stepSize,
+    const size_t maxIterations,
+    const double tolerance,
+    const double kappa,
+    const double phi,
+    const double theta,
+    const double minStepSize,
+    const bool resetPolicy) : 
+    optimizer(stepSize,
+              maxIterations,
+              tolerance,
+              DeltaBarDeltaUpdate(stepSize, kappa, phi, theta, minStepSize),
+              NoDecay(),
+              resetPolicy)
+{
+  /* Nothing to do. */
+}
+
+} // namespace ens
+
+#endif // ENSMALLEN_DELTA_BAR_DELTA_IMPL_HPP