flux lora just produces noise #3356
YakisobaRamen
started this conversation in
General
Replies: 1 comment 1 reply
-
after some more testing, the issue persists and has nothing to do with --fused_backward_pass. It happens regardless of what blocks are trained, gradient clipping or choice of optimizer. If the LR is >0.00001 the model collapses into white noise after a few hundred steps. With a LR > 0.00008 the collapse happens immediately. At 0.00001 or lower over 1000 steps there is no collapse but also no learning happening whatsoever. This is independent of the dataset or checkpoint. I have updated cuda to the latest version, updated the driver and created a new, clean environment. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Recently I have been trying to switch from SDXL to FLUX when training loras with kohya, but for
some reason every FLUX lora just abruptly disintegrates into white noise after around
300 steps.
The first two epochs are usually ok and then the sample sampler only
produces noise. This happens with every checkpoint, every checkpoint format, every dataset
(small, large, lowres, highres), every motive (landscapes, people, architecture).
This is also the case during inference with comfyui, so it's not just a problem with the sample sampler.
I've tested it with kohya from the main branch which runs in my own docker container, as
well as the latest dev branch, which I run outside of docker.
I'm using a 5060ti (16gb) and therefore had to turn on --fused_backward_pass, so that I can
offload some blocks to CPU. Since fused_backward_pass currently only supports adafactor
I'm constrained to using that. I have tried several schedulers.
Initial LR is 0.00001 for base and unet (but I also tried higher LRs). I do not train the text encoders.
I usually swap 24 blocks, which seems to be the minimum for all the checkpoints that I've tried.
I train double blocks 0,1,2,3,4,5 and single blocks 0,1,2,3,4,5,6,7,8. The avr_loss looks normal,
no NaNs. I use a vae, t5xxl and clip-l from a huggingface repo. bf16 for training and saving. fp8 options
are on for fp8 models. I'm on linux.
Beta Was this translation helpful? Give feedback.
All reactions