-
Notifications
You must be signed in to change notification settings - Fork 5
Model Training ‐ Comparison ‐ [Optimizer]
Models | Logs | Graphs | Configs
The optimizer is responsible for finding the best values for the parameters of the trainable model. In simple terms, optimizers differ in the algorithms they use to search for these best values.
We won't compare all the possible optimizers because it will require changing a lot of other parameters and retraining the models. So, we will try Prodigy
optimizer that is considered an successor of DAdaptAdam
optimizer we've been using all this time. Prodigy
also requires setting some additional optimizer args, you can find them in the configs.
Compared values:
-
DAdaptAdam
-B
, -
Prodigy
.
But we won't just compare them face to face as we did before. We'll also add regularisation to our comparison. So, we'll how it performs in these cases:
-
GR = 1.02
, -
GR = ∞
, -
Regularisation with Unsplash Photos at Max Resolution using RV2.0 model
, -
Regularisation with Unsplash Photos at Max Resolution using 28D28 model
.
DLR(step)
With GR = ∞
graph has some wierd bumps we've never seen before.
Loss(epoch)
But it doesn't affect loss
as I see.
I guess, we all are a little tired of the grids. So, lets compare models with the base settings and models with our most successful settings, but only change the optimizer in both cases.
Ok, nothing actually changed. Results are pretty similar.
I don't see enough difference to change the optimizer. It's easier to stick with known DAdaptAdam
optimizer for now.
- Introduction
- Examples
- Dataset Preparation
- Model Training ‐ Introduction
- Model Training ‐ Basics
- Model Training ‐ Comparison - Introduction
Short Way
Long Way
- Model Training ‐ Comparison - [Growth Rate]
- Model Training ‐ Comparison - [Betas]
- Model Training ‐ Comparison - [Weight Decay]
- Model Training ‐ Comparison - [Bias Correction]
- Model Training ‐ Comparison - [Decouple]
- Model Training ‐ Comparison - [Epochs x Repeats]
- Model Training ‐ Comparison - [Resolution]
- Model Training ‐ Comparison - [Aspect Ratio]
- Model Training ‐ Comparison - [Batch Size]
- Model Training ‐ Comparison - [Network Rank]
- Model Training ‐ Comparison - [Network Alpha]
- Model Training ‐ Comparison - [Total Steps]
- Model Training ‐ Comparison - [Scheduler]
- Model Training ‐ Comparison - [Noise Offset]
- Model Training ‐ Comparison - [Min SNR Gamma]
- Model Training ‐ Comparison - [Clip Skip]
- Model Training ‐ Comparison - [Precision]
- Model Training ‐ Comparison - [Number of CPU Threads per Core]
- Model Training ‐ Comparison - [Checkpoint]
- Model Training ‐ Comparison - [Regularisation]
- Model Training ‐ Comparison - [Optimizer]