Randomized and Incremental Regularization Image Loading - Overhaul of Regularization Image Loading Logic #2096
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the original implementation of the regularization image loading, the code will load the first N regularization images, where N is number of training images * number of repeats.
this leads to a couple of edge cases where there could be suboptimal results for training when considering use cases for training on free resources such as google colab which limits the number of hours training can run per day. when the number of regularization images is not equal to N. When N is greater, due to the first few images consistently having additional repeats added to the dataset can, over extended training over muliple epochs and/or resumed training, lead to them having stronger influence on the training model.
When N is lessor than the number of regularization images available, this means that some training strategies which make use of the regularization images to simultaneously improve the overall quality by adding additional ground truth images would not be able to fully utilize all the prepared regularization images and captions.
Additionally use of multiple subsets to organize categories of regularization images may result in training being weighted unintentionally by user to specific concepts based on the order the subsets are loaded.
This pull request intends to migitate this by implementing two training strategies:-
Both strategies can by activated separately, together and turned off completely (returning to the original loading strategy by default) using the arguements
--incremental_reg_load
and--randomized_regularization_image
Points to consider:
--incremental_reg_load
, the length of the dataloader will vary between epochs, especially when using buckets due to possible different number of batchs available.--incremental_reg_load
, persistent dataloader workers would not work as the dataloaders have to be recreated at the start of each epoch in order to correctly update the number of batches available.__len__()
values to enclosing DatasetGroup class.cache_text_encoder_outputs_if_needed()
function had to be moved to just before training starts for the epoch.cache_text_encoder_outputs_if_needed()
was put in place as here.As this is a proof of concept only, it is implemented only in the LoRA training script for SD1.5 and SDXL.
Please let me know if you have any questions