Implement `DeltaBarDelta` using refactored `GradientDescent`. #440

ranjodhsingh1729 · 2025-09-20T10:39:43Z

Closes: #437

This PR:

Implements DeltaBarDeltaUpdate
Refactors GradientDescent to GradientDescentType<UpdatePolicyType, DecayPolicyType>
Defines DeltaBarDelta as a typedef to GradientDescentType<DeltaBarDeltaUpdate, NoDecay>

GradientDescent is now a typedef to GradientDescentType<VanillaUpdate, NoDecay>.

ranjodhsingh1729 · 2025-09-30T11:10:23Z

Hi,
would it make sense to include a change here which allows PrintLoss callback to work with GradientDescentType (since ProgressBar() would require more invasive changes). It doesn't currently as PrintLoss prints when EndEpoch is called and GradientDescentType never calls EndEpoch.

ranjodhsingh1729 · 2025-10-07T08:45:50Z

@conradsnicta could you please approove the CI build i can't test this with bandicoot locally.

zoq · 2025-10-12T14:12:48Z

Had to restart the GPU runner, restarted the jobs.

ranjodhsingh1729 · 2025-10-12T19:21:50Z

Thanks @zoq
The Tests ran successfully.

zoq

Fantastic, can you also update the HISTORY.md file.

ranjodhsingh1729 · 2025-10-13T07:47:36Z

Hi

Fantastic, can you also update the HISTORY.md file.

Done.

Hi, would it make sense to include a change here which allows PrintLoss callback to work with GradientDescentType (since ProgressBar() would require more invasive changes). It doesn't currently as PrintLoss prints when EndEpoch is called and GradientDescentType never calls EndEpoch.

What do you think about this.

Additionally I found this small inconsistency.
If maxIterations is non-zero it will perform one less iteration.

ensmallen/include/ensmallen_bits/gradient_descent/gradient_descent_impl.hpp

Line 70 in 9aebc1d

for (size_t i = 1; i != maxIterations && !terminate; ++i)

Should I go ahead and fix these here? OR seperately?

…ayPolicyType>

ranjodhsingh1729 · 2025-10-14T06:51:28Z

In general, fixes are orthogonal to new features, and should be kept separated.

Thanks @zoq for the great advice.
It made sense the moment i read it. :)

I think this is ready for review then.
This isn't urgent or anything whenever you(or anyone else) can find the time will be good.

rcurtin · 2025-10-17T16:01:48Z

It is definitely not okay to merge an optimizer that has not been thoroughly reviewed. I will get to it when I have a chance and there is no schedule on when that will be.

rcurtin

Thanks for the contribution @ranjodhsingh1729. I have some concerns about the correctness of the implementation---can you look into the comments I left? Let me know if I can clarify anything, or if I missed anything. I used primarily section 6 of the paper for my review.

rcurtin · 2025-10-22T17:40:43Z

doc/optimizers.md

 for all Lagrange multipliers and 10 is used as the initial penalty parameter.

-</details>
-


Thanks, nice catch!

rcurtin · 2025-10-22T17:41:27Z

doc/optimizers.md

+
+#### See also:
+
+ * [Increased rates of convergence through learning rate adaptation](https://www.academia.edu/download/32005051/Jacobs.NN88.pdf)


Suggested change

* [Increased rates of convergence through learning rate adaptation](https://www.academia.edu/download/32005051/Jacobs.NN88.pdf)

* [Increased rates of convergence through learning rate adaptation (pdf)](https://www.academia.edu/download/32005051/Jacobs.NN88.pdf)

Most of the other links indicate to people when a PDF is going to be opened/downloaded.

Also it's probably worth linking to GradientDescent here too.

rcurtin · 2025-10-22T17:48:28Z

doc/optimizers.md

+
+Note that `DeltaBarDelta` is based on the templated type
+`GradientDescentType<`_`UpdatePolicyType, DecayPolicyType`_`>` with _`UpdatePolicyType`_` =
+`DeltaBarDeltaUpdate` and _`DecayPolicyType`_` = NoDecay`.


Because DeltaBarDelta always uses the DeltaBarDeltaUpdate class, it doesn't really make sense to the user to provide a constructor that takes updatePolicy, decayPolicy, and resetPolicy. Of course an advanced user can do this if they take a look at the internals of the GradientDescent class. But for the typical user, they just want to set the settings of DeltaBarDelta and move on. So I would suggest adding a wrapper class DeltaBarDelta that just calls GradientDescentType<DeltaBarDeltaUpdate, NoDecay> internally (there are many of these for SGD variants), and also provides a constructor that forwards the parameters of DeltaBarDeltaUpdate:

DeltaBarDelta(stepSize, maxIterations, tolerance, kappa, phi, momentum, minGain, resetPolicy)

Then the table of options below will be much simplified too.

rcurtin · 2025-10-22T17:50:47Z

include/ensmallen_bits/gradient_descent/gradient_descent.hpp

+  //! Get the step size decay policy.
+  const DecayPolicyType& DecayPolicy() const { return decayPolicy; }
+  //! Modify the step size decay policy.
+  DecayPolicyType& DecayPolicy() { return decayPolicy; }


Shouldn't there be one for ResetPolicy() too?

rcurtin · 2025-10-22T17:51:29Z

include/ensmallen_bits/gradient_descent/gradient_descent_impl.hpp

+    const bool resetPolicy)
+    : stepSize(stepSize), maxIterations(maxIterations), tolerance(tolerance),
+      updatePolicy(updatePolicy), decayPolicy(decayPolicy),
+      resetPolicy(resetPolicy), isInitialized(false)


Suggested change

const bool resetPolicy)

: stepSize(stepSize), maxIterations(maxIterations), tolerance(tolerance),

updatePolicy(updatePolicy), decayPolicy(decayPolicy),

resetPolicy(resetPolicy), isInitialized(false)

const bool resetPolicy) :

stepSize(stepSize),

maxIterations(maxIterations),

tolerance(tolerance),

updatePolicy(updatePolicy),

decayPolicy(decayPolicy),

resetPolicy(resetPolicy),

isInitialized(false)

Just a style fix.

rcurtin · 2025-10-22T18:01:18Z

tests/delta_bar_delta_test.cpp

+  DeltaBarDelta s(0.001, 0, 1e-15);
+  FunctionTest<RosenbrockFunction, TestType>(s, 0.01, 0.001);
+}
+


Can you add one more test for the logistic regression test function? That is a really simple test but it is a good sanity check for machine learning applications. It should be really easy; here's the Adam logistic regression function test:

TEMPLATE_TEST_CASE("Adam_LogisticRegressionFunction", "[Adam]", ENS_ALL_TEST_TYPES) { Adam adam(0.032); LogisticRegressionFunctionTest<TestType>(adam); }

rcurtin · 2025-10-22T18:02:57Z

tests/delta_bar_delta_test.cpp

@@ -0,0 +1,51 @@
+/**
+ * @file delta_bar_detla_test.cpp


Suggested change

* @file delta_bar_detla_test.cpp

* @file delta_bar_delta_test.cpp

rcurtin · 2025-10-23T13:08:15Z

include/ensmallen_bits/gradient_descent/update_policies/delta_bar_delta_update.hpp

+      const MatType mask = conv_to<MatType>::from((gradient % velocity) < 0.0);
+
+      gains += mask * parent.Kappa();
+      gains %= mask + (1 - mask) *  parent.Phi();


Based on my read of Eq. (4) in the paper I'm not sure this is correct. The value of gains[i] should not change if (gradient % velocity)[i] is 0; but given how mask is set, I can't quickly see that this is correct.

I would suggest maybe using sign() instead to create a matrix indicating whether the element is positive, negative, or zero, and then use that matrix to apply the update rule in (4).

rcurtin · 2025-10-23T13:10:26Z

include/ensmallen_bits/gradient_descent/update_policies/delta_bar_delta_update.hpp

+
+      velocity *= parent.momentum;
+      velocity -= (stepSize * gains) % gradient;
+      iterate += velocity;


But the paper seems to indicate we update with ϵ(t + 1) in Eq. (3), not ẟ'(t)?

I would suggest using the same terminology as the paper throughout even if it is not intuitive what ẟ', ϵ, and θ actually mean---you can clarify that part in the comments.

rcurtin · 2025-10-23T13:12:56Z

include/ensmallen_bits/gradient_descent/update_policies/delta_bar_delta_update.hpp

+      gains.clamp(parent.MinimumGain(), arma::datum::inf);
+
+      velocity *= parent.momentum;
+      velocity -= (stepSize * gains) % gradient;


How does this implement the exponential average ẟ'(t) = (1 - θ) ẟ(t) + θ ẟ'(t - 1)? This doesn't appear to be that.

ranjodhsingh1729 · 2025-10-26T19:38:16Z

@rcurtin, Thanks a lot for the thorough review - turns out i've made quite a mistake here.

Since both t-SNE papers (the original and the tree-based approximation) state using Jacobs's scheme and cite this paper, I assumed it was used as-is and didn't verifying that assumption.

Now i know that what is used in t-SNE was a modified version of the DBD rule that incorporates momentum, while the original DBD only includes momentum for 'comparison' and doesn't apply it in updates.

Glad this was caught during review - definitely a reminder not to rush things like this in the future. 😅

Since there's already an open issue, PR, and review, I've gone ahead and implemented the actual delta-bar-delta rule and also addressed the review suggestions.

So I guess this puts me back at square one - I'll need to implement the t-SNE update rule in mlpack and hook it up with the refactored gradient descent class. Since it's not really used elsewhere, adding it to ensmallen probably doesn't make sense. Does that sound right, or is there a cleaner way to handle it?

And sorry for the confusion here.

ranjodhsingh1729 force-pushed the deltabardelta branch 2 times, most recently from 7a0f03c to 808cabb Compare September 20, 2025 11:24

conradsnicta requested a review from rcurtin September 30, 2025 03:55

zoq reviewed Oct 12, 2025

View reviewed changes

ranjodhsingh1729 added 9 commits October 13, 2025 22:08

Refactor GradientDescent to GradientDescentType<UpdatePolicyType, Dec…

9138f30

…ayPolicyType>

Add DeltaBarDeltaUpdate

398ba31

Update Docs

c8c74ab

Add Tests

b0b2d0a

Use clamp instead of find and fill

f1bae5c

Try to fix CI Failure

143e236

Update History.md

244d969

Simplify and Correct DeltaBarDelta Description

4ead909

Revert inadvertent code change

9ef2543

ranjodhsingh1729 force-pushed the deltabardelta branch from 540cdcd to 9ef2543 Compare October 13, 2025 16:58

Simplify description in optmizer docs too

ac44ff6

initialize isInitialized :)

7b9f7a7

conradsnicta requested a review from zoq October 17, 2025 15:58

rcurtin reviewed Oct 23, 2025

View reviewed changes

Address review suggestions

651850a

		for all Lagrange multipliers and 10 is used as the initial penalty parameter.

		</details>


		#### See also:

		* [Increased rates of convergence through learning rate adaptation](https://www.academia.edu/download/32005051/Jacobs.NN88.pdf)

	* @file delta_bar_detla_test.cpp
	* @file delta_bar_delta_test.cpp

Uh oh!

Implement DeltaBarDelta using refactored GradientDescent. #440

Are you sure you want to change the base?

Implement DeltaBarDelta using refactored GradientDescent. #440

Conversation

ranjodhsingh1729 commented Sep 20, 2025

Uh oh!

ranjodhsingh1729 commented Sep 30, 2025

Uh oh!

ranjodhsingh1729 commented Oct 7, 2025

Uh oh!

zoq commented Oct 12, 2025

Uh oh!

ranjodhsingh1729 commented Oct 12, 2025

Uh oh!

zoq left a comment

Choose a reason for hiding this comment

Uh oh!

ranjodhsingh1729 commented Oct 13, 2025

Uh oh!

ranjodhsingh1729 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rcurtin commented Oct 17, 2025

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ranjodhsingh1729 commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement `DeltaBarDelta` using refactored `GradientDescent`. #440

Implement `DeltaBarDelta` using refactored `GradientDescent`. #440

ranjodhsingh1729 commented Oct 14, 2025 •

edited

Loading