-
Notifications
You must be signed in to change notification settings - Fork 5
Model Training ‐ Comparison ‐ [Scheduler]
Models | Logs | Graphs | Configs
Scheduler
defines the function of LR
changing during the training.
Compared values:
-
cosine
-BD
, -
constant
, -
polynomial
.
In TensorBoard
, in addition to the graphs we have previously discussed, there are also graphs showing the changes in LR
for the U-Net
and Text Encoder
. It's easy to understand which function is which.
DLR(step)
DLR
initially gradually increases and then follows the function defined by the Scheduler
.
Loss(step)
In the case of GR = 1.02
, all the graphs converge into one, which suggests that Scheduler
may not have a significant impact on the results.
On the other hand, with GR = ∞
, the graphs differ, but it's strange that with constant
we get the highest loss, even though intuitively it might seem like it should be the opposite.
As for these grids, it's clear that the assumption was correct - Scheduler
doesn't have a significant impact on the result with GR = 1.02
. However, with GR = ∞
, its influence is hard to ignore. If on the final epochs, cosine
and polynomial
lead to the model barely learning since DLR
gets closer to zero, with constant
training continues at full strength.
The results with the cosine
and polynomial
are similar across all cases, and subjectively, they appear to be better than constant
.
Given the similarity between cosine
and polynomial
schedulers, it's easier to stick with the standard cosine
scheduler. In theory, constant
scheduler should significantly speed up training as it doesn't slow down over time, but the results may become worse. Also, using constant
, we're making our smart adaptive optimizer a silly non-adaptive optimizer.
- Introduction
- Examples
- Dataset Preparation
- Model Training ‐ Introduction
- Model Training ‐ Basics
- Model Training ‐ Comparison - Introduction
Short Way
Long Way
- Model Training ‐ Comparison - [Growth Rate]
- Model Training ‐ Comparison - [Betas]
- Model Training ‐ Comparison - [Weight Decay]
- Model Training ‐ Comparison - [Bias Correction]
- Model Training ‐ Comparison - [Decouple]
- Model Training ‐ Comparison - [Epochs x Repeats]
- Model Training ‐ Comparison - [Resolution]
- Model Training ‐ Comparison - [Aspect Ratio]
- Model Training ‐ Comparison - [Batch Size]
- Model Training ‐ Comparison - [Network Rank]
- Model Training ‐ Comparison - [Network Alpha]
- Model Training ‐ Comparison - [Total Steps]
- Model Training ‐ Comparison - [Scheduler]
- Model Training ‐ Comparison - [Noise Offset]
- Model Training ‐ Comparison - [Min SNR Gamma]
- Model Training ‐ Comparison - [Clip Skip]
- Model Training ‐ Comparison - [Precision]
- Model Training ‐ Comparison - [Number of CPU Threads per Core]
- Model Training ‐ Comparison - [Checkpoint]
- Model Training ‐ Comparison - [Regularisation]
- Model Training ‐ Comparison - [Optimizer]