Skip to content

Model Training ‐ Comparison ‐ [Scheduler]

Nikita K edited this page Sep 29, 2023 · 4 revisions

Models | Logs | Graphs | Configs


Scheduler defines the function of LR changing during the training.


Compared values:

  • cosine - BD,

  • constant,

  • polynomial.


In TensorBoard, in addition to the graphs we have previously discussed, there are also graphs showing the changes in LR for the U-Net and Text Encoder. It's easy to understand which function is which.


DLR(step)

DLR initially gradually increases and then follows the function defined by the Scheduler.


Loss(step)

In the case of GR = 1.02, all the graphs converge into one, which suggests that Scheduler may not have a significant impact on the results.

On the other hand, with GR = ∞, the graphs differ, but it's strange that with constant we get the highest loss, even though intuitively it might seem like it should be the opposite.


As for these grids, it's clear that the assumption was correct - Scheduler doesn't have a significant impact on the result with GR = 1.02. However, with GR = ∞, its influence is hard to ignore. If on the final epochs, cosine and polynomial lead to the model barely learning since DLR gets closer to zero, with constant training continues at full strength.


The results with the cosine and polynomial are similar across all cases, and subjectively, they appear to be better than constant.

CONCLUSION

Given the similarity between cosine and polynomial schedulers, it's easier to stick with the standard cosine scheduler. In theory, constant scheduler should significantly speed up training as it doesn't slow down over time, but the results may become worse. Also, using constant, we're making our smart adaptive optimizer a silly non-adaptive optimizer.


Next - Model Training ‐ Comparison - [Noise Offset]

Clone this wiki locally