Skip to content

Model Training ‐ Comparison ‐ [Network Rank]

Nikita K edited this page Sep 18, 2023 · 4 revisions

Models | Logs | Graphs | Configs


Network Rank (NR) determines how much information our model can memorize.


Compared values:

  • 32,

  • 64,

  • 128,

  • 192 - B,

  • 256,

  • 512.


DLR(step)

At GR = 1.02, the logic is simple: the higher the NR, the higher the DLR. In other words, the more the model can memorize, the faster it learns.

At GR = ∞, things work differently, and for some reason, the DLR is higher for NR = 128 than for NR = 256. That's strange.


Loss(epoch)

However, the loss(epoch) graphs do not show any significant deviations, and in all cases, the graphs are nearly identical.

Also changing NR has a slight impact on training time, but it has a much stronger effect on VRAM consumption and the model file size:

  • 32 - 7.7 Gb | 37 Mb,

  • 64 - 8.0 Gb | 74 Mb,

  • 128 - 8.6 Gb | 148 Mb

  • 192 - 9.5 Gb | 221 Mb,

  • 256 - 10.0 Gb | 295 Mb,

  • 512 - 12.5 Gb | 590 Mb.

So, when increasing NR by 32, VRAM consumption increases by approximately 300 Mb, and the model file size increases by 37 Mb.



Once again, there doesn't seem to be an explosive increase in quality. Models with NR = 32 and NR = 64 seem to lack more similarity to the character, but overall, the other models provide similar quality results.


CONCLUSION

The range of optimal values seems to be between 128 and 256. Going higher doesn't make sense because the increase in VRAM usage isn't compensated by a corresponding improvement in quality. Going lower makes sense only if you are constrained by VRAM limitations, but it may result in a decrease in similarity to the character.


Next - Model Training ‐ Comparison - [Network Alpha]

Clone this wiki locally