Skip to content

Model Training ‐ Comparison ‐ [Optimizer]

Nikita K edited this page Oct 8, 2023 · 5 revisions

Models | Logs | Graphs | Configs


The optimizer is responsible for finding the best values for the parameters of the trainable model. In simple terms, optimizers differ in the algorithms they use to search for these best values.

We won't compare all the possible optimizers because it will require changing a lot of other parameters and retraining the models. So, we will try Prodigy optimizer that is considered an successor of DAdaptAdam optimizer we've been using all this time. Prodigy also requires setting some additional optimizer args, you can find them in the configs.


Compared values:

  • DAdaptAdam - B,

  • Prodigy.

But we won't just compare them face to face as we did before. We'll also add regularisation to our comparison. So, we'll how it performs in these cases:

  • GR = 1.02,

  • GR = ∞,

  • Regularisation with Unsplash Photos at Max Resolution using RV2.0 model,

  • Regularisation with Unsplash Photos at Max Resolution using 28D28 model.


DLR(step)

With GR = ∞ graph has some wierd bumps we've never seen before.


Loss(epoch)

But it doesn't affect loss as I see.


I guess, we all are a little tired of the grids. So, lets compare models with the base settings and models with our most successful settings, but only change the optimizer in both cases.

Ok, nothing actually changed. Results are pretty similar.


CONCLUSION

I don't see enough difference to change the optimizer. It's easier to stick with known DAdaptAdam optimizer for now.


Next - Model Training ‐ Comparison - Brief

Clone this wiki locally