Skip to content

Got this issue after starting trainer - def replace_unet_modules(unet: diffusers.models.unet_2d_condition.UNet2DConditionModel, mem_eff_attn, xformers): #1624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
khakachang opened this issue May 6, 2025 · 0 comments

Comments

@khakachang
Copy link

System Info

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1746521310.623558 1968 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1746521310.705754 1968 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Traceback (most recent call last):
File "/content/kohya-trainer/train_network.py", line 17, in
import library.train_util as train_util
File "/content/kohya-trainer/library/train_util.py", line 1767, in
def replace_unet_modules(unet: diffusers.models.unet_2d_condition.UNet2DConditionModel, mem_eff_attn, xformers):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/diffusers/utils/import_utils.py", line 813, in getattr
raise AttributeError(f"module {self.name} has no attribute {name}")
AttributeError: module diffusers.models has no attribute unet_2d_condition. Did you mean: 'unets.unet_2d_condition'?
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 10, in
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 50, in main
args.func(args)
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 1213, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 795, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_network.py',

Reproduction

import os
import re
import toml
import shutil
import zipfile
from time import time
from IPython.display import Markdown, display

These carry information from past executions

if "model_url" in globals():
old_model_url = model_url
else:
old_model_url = None
if "dependencies_installed" not in globals():
dependencies_installed = False
if "model_file" not in globals():
model_file = None

These may be set by other cells, some are legacy

if "custom_dataset" not in globals():
custom_dataset = None
if "override_dataset_config_file" not in globals():
override_dataset_config_file = None
if "override_config_file" not in globals():
override_config_file = None
if "optimizer" not in globals():
optimizer = "AdamW8bit"
if "optimizer_args" not in globals():
optimizer_args = None
if "continue_from_lora" not in globals():
continue_from_lora = ""
if "weighted_captions" not in globals():
weighted_captions = False
if "adjust_tags" not in globals():
adjust_tags = False
if "keep_tokens_weight" not in globals():
keep_tokens_weight = 1.0

COLAB = True # low ram
COMMIT = "e6ad3cbc66130fdc3bf9ecd1e0272969b1d613f7"
BETTER_EPOCH_NAMES = True
LOAD_TRUNCATED_IMAGES = True

#@title ## 🚩 Start Here

#@markdown ### ▶️ Setup
#@markdown Your project name will be the same as the folder containing your images. Spaces aren't allowed.
project_name = "koling" #@param {type:"string"}
#@markdown The folder structure doesn't matter and is purely for comfort. Make sure to always pick the same one. I like organizing by project.
folder_structure = "Organize by project (MyDrive/Loras/project_name/dataset)" #@param ["Organize by category (MyDrive/lora_training/datasets/project_name)", "Organize by project (MyDrive/Loras/project_name/dataset)"]
#@markdown Decide the model that will be downloaded and used for training. These options should produce clean and consistent results. You can also choose your own by pasting its download link.
training_model = "AnyLora (AnyLoRA_noVae_fp16-pruned.ckpt)" #@param ["Anime (animefull-final-pruned-fp16.safetensors)", "AnyLora (AnyLoRA_noVae_fp16-pruned.ckpt)", "Stable Diffusion (sd-v1-5-pruned-noema-fp16.safetensors)"]
optional_custom_training_model_url = "" #@param {type:"string"}
custom_model_is_based_on_sd2 = False #@param {type:"boolean"}

if optional_custom_training_model_url:
model_url = optional_custom_training_model_url
elif "AnyLora" in training_model:
model_url = "https://huggingface.co/Lykon/AnyLoRA/resolve/main/AnyLoRA_noVae_fp16-pruned.ckpt"
elif "Anime" in training_model:
model_url = "https://huggingface.co/hollowstrawberry/stable-diffusion-guide/resolve/main/models/animefull-final-pruned-fp16.safetensors"
else:
model_url = "https://huggingface.co/hollowstrawberry/stable-diffusion-guide/resolve/main/models/sd-v1-5-pruned-noema-fp16.safetensors"

#@markdown ### ▶️ Processing
#@markdown Resolution of 512 is standard for Stable Diffusion 1.5. Higher resolution training is much slower but can lead to better details.


#@markdown Images will be automatically scaled while training to produce the best results, so you don't need to crop or resize anything yourself.
resolution = 512 #@param {type:"slider", min:512, max:1024, step:128}
#@markdown This option will train your images both normally and flipped, for no extra cost, to learn more from them. Turn it on specially if you have less than 20 images.


#@markdown Turn it off if you care about asymmetrical elements in your Lora.
flip_aug = False #@param {type:"boolean"}
#markdown Leave empty for no captions.
caption_extension = ".txt" #param {type:"string"}
#@markdown Shuffling anime tags in place improves learning and prompting. An activation tag goes at the start of every text file and will not be shuffled.
shuffle_tags = True #@param {type:"boolean"}
shuffle_caption = shuffle_tags
activation_tags = "1" #@param [0,1,2,3]
keep_tokens = int(activation_tags)

#@markdown ### ▶️ Steps


#@markdown Your images will repeat this number of times during training. I recommend that your images multiplied by their repeats is between 200 and 400.
num_repeats = 10 #@param {type:"number"}
#@markdown Choose how long you want to train for. A good starting point is around 10 epochs or around 2000 steps.


#@markdown One epoch is a number of steps equal to: your number of images multiplied by their repeats, divided by batch size.


preferred_unit = "Epochs" #@param ["Epochs", "Steps"]
how_many = 10 #@param {type:"number"}
max_train_epochs = how_many if preferred_unit == "Epochs" else None
max_train_steps = how_many if preferred_unit == "Steps" else None
#@markdown Saving more epochs will let you compare your Lora's progress better.
save_every_n_epochs = 1 #@param {type:"number"}
keep_only_last_n_epochs = 10 #@param {type:"number"}
if not save_every_n_epochs:
save_every_n_epochs = max_train_epochs
if not keep_only_last_n_epochs:
keep_only_last_n_epochs = max_train_epochs
#@markdown Increasing the batch size makes training faster, but may make learning worse. Recommended 2 or 3.
train_batch_size = 2 #@param {type:"slider", min:1, max:8, step:1}

#@markdown ### ▶️ Learning
#@markdown The learning rate is the most important for your results. If you want to train slower with lots of images, or if your dim and alpha are high, move the unet to 2e-4 or lower.


#@markdown The text encoder helps your Lora learn concepts slightly better. It is recommended to make it half or a fifth of the unet. If you're training a style you can even set it to 0.
unet_lr = 5e-4 #@param {type:"number"}
text_encoder_lr = 1e-4 #@param {type:"number"}
#@markdown The scheduler is the algorithm that guides the learning rate. If you're not sure, pick constant and ignore the number. I personally recommend cosine_with_restarts with 3 restarts.
lr_scheduler = "cosine_with_restarts" #@param ["constant", "cosine", "cosine_with_restarts", "constant_with_warmup", "linear", "polynomial"]
lr_scheduler_number = 3 #@param {type:"number"}
lr_scheduler_num_cycles = lr_scheduler_number if lr_scheduler == "cosine_with_restarts" else 0
lr_scheduler_power = lr_scheduler_number if lr_scheduler == "polynomial" else 0
#@markdown Steps spent "warming up" the learning rate during training for efficiency. I recommend leaving it at 5%.
lr_warmup_ratio = 0.05 #@param {type:"slider", min:0.0, max:0.5, step:0.01}
lr_warmup_steps = 0
#@markdown New feature that adjusts loss over time, makes learning much more efficient, and training can be done with about half as many epochs. Uses a value of 5.0 as recommended by the paper.
min_snr_gamma = True #@param {type:"boolean"}
min_snr_gamma_value = 5.0 if min_snr_gamma else None

#@markdown ### ▶️ Structure
#@markdown LoRA is the classic type, while LoCon is good with styles. Lycoris require this extension for webui to work like normal loras. More info here.
lora_type = "LoRA" #@param ["LoRA", "LoCon Lycoris", "LoHa Lycoris"]

#@markdown Below are some recommended values for the following settings:

#@markdown | type | network_dim | network_alpha | conv_dim | conv_alpha |
#@markdown | :---: | :---: | :---: | :---: | :---: |
#@markdown | LoRA | 32 | 16 | | |
#@markdown | LoCon | 16 | 8 | 8 | 1 |
#@markdown | LoHa | 8 | 4 | 4 | 1 |

#@markdown More dim means larger Lora, it can hold more information but more isn't always better. A dim between 8-32 is recommended, and alpha equal to half the dim.
network_dim = 16 #@param {type:"slider", min:1, max:128, step:1}
network_alpha = 8 #@param {type:"slider", min:1, max:128, step:1}
#@markdown The following values don't affect LoRA. They work like dim/alpha but only for the additional learning layers of Lycoris.
conv_dim = 8 #@param {type:"slider", min:1, max:64, step:1}
conv_alpha = 1 #@param {type:"slider", min:1, max:64, step:1}
conv_compression = False #@param {type:"boolean"}

network_module = "lycoris.kohya" if "Lycoris" in lora_type else "networks.lora"
network_args = None if lora_type == "LoRA" else [
f"conv_dim={conv_dim}",
f"conv_alpha={conv_alpha}",
]
if "Lycoris" in lora_type:
network_args.append(f"algo={'loha' if 'LoHa' in lora_type else 'lora'}")
network_args.append(f"disable_conv_cp={str(not conv_compression)}")

#markdown ### ▶️ Experimental
#markdown Save additional data equaling ~1 GB allowing you to resume training later.
save_state = False #param {type:"boolean"}
#markdown Resume training if a save state is found.
resume = False #param {type:"boolean"}

#@markdown ### ▶️ Ready
#@markdown You can now run this cell to cook your Lora. Good luck!

👩‍💻 Cool code goes here

if optimizer == "DAdaptation":
optimizer_args = ["decouple=True","weight_decay=0.02","betas=[0.9,0.99]"]
unet_lr = 0.5
text_encoder_lr = 0.5
lr_scheduler = "constant_with_warmup"
network_alpha = network_dim

root_dir = "/content" if COLAB else "~/Loras"
deps_dir = os.path.join(root_dir, "deps")
repo_dir = os.path.join(root_dir, "kohya-trainer")

if "/Loras" in folder_structure:
main_dir = os.path.join(root_dir, "drive/MyDrive/Loras") if COLAB else root_dir
log_folder = os.path.join(main_dir, "_logs")
config_folder = os.path.join(main_dir, project_name)
images_folder = os.path.join(main_dir, project_name, "dataset")
output_folder = os.path.join(main_dir, project_name, "output")
else:
main_dir = os.path.join(root_dir, "drive/MyDrive/lora_training") if COLAB else root_dir
images_folder = os.path.join(main_dir, "datasets", project_name)
output_folder = os.path.join(main_dir, "output", project_name)
config_folder = os.path.join(main_dir, "config", project_name)
log_folder = os.path.join(main_dir, "log")

config_file = os.path.join(config_folder, "training_config.toml")
dataset_config_file = os.path.join(config_folder, "dataset_config.toml")
accelerate_config_file = os.path.join(repo_dir, "accelerate_config/config.yaml")

def clone_repo():
os.chdir(root_dir)
!git clone https://github.com/kohya-ss/sd-scripts {repo_dir}
os.chdir(repo_dir)
if COMMIT:
!git reset --hard {COMMIT}
!wget https://raw.githubusercontent.com/hollowstrawberry/kohya-colab/main/requirements.txt -q -O requirements.txt

def install_dependencies():
clone_repo()
!apt -y update -qq
!apt -y install aria2 -qq
!pip -q install --upgrade -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118

patch kohya for minor stuff

if COLAB:
!sed -i "s@cpu@cuda@" library/model_util.py # low ram
if LOAD_TRUNCATED_IMAGES:
!sed -i 's/from PIL import Image/from PIL import Image, ImageFile\nImageFile.LOAD_TRUNCATED_IMAGES=True/g' library/train_util.py # fix truncated jpegs error
if BETTER_EPOCH_NAMES:
!sed -i 's/{:06d}/{:02d}/g' library/train_util.py # make epoch names shorter
!sed -i 's/"." + args.save_model_as)/"-{:02d}.".format(num_train_epochs) + args.save_model_as)/g' train_network.py # name of the last epoch will match the rest

from accelerate.utils import write_basic_config
if not os.path.exists(accelerate_config_file):
write_basic_config(save_location=accelerate_config_file)

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
os.environ["BITSANDBYTES_NOWELCOME"] = "1"
os.environ["SAFETENSORS_FAST_GPU"] = "1"

def validate_dataset():
global lr_warmup_steps, lr_warmup_ratio, caption_extension, keep_tokens, keep_tokens_weight, weighted_captions, adjust_tags
supported_types = (".png", ".jpg", ".jpeg", ".webp", ".bmp")

print("\n💿 Checking dataset...")
if not project_name.strip() or any(c in project_name for c in " .()"'\/"):
print("💥 Error: Please choose a valid project name.")
return

if custom_dataset:
try:
datconf = toml.loads(custom_dataset)
datasets = [d for d in datconf["datasets"][0]["subsets"]]
except:
print(f"💥 Error: Your custom dataset is invalid or contains an error! Please check the original template.")
return
reg = [d for d in datasets if d.get("is_reg", False)]
for r in reg:
print("📁"+r["image_dir"].replace("/content/drive/", "") + " (Regularization)")
datasets = [d for d in datasets if d not in reg]
datasets_dict = {d["image_dir"]: d["num_repeats"] for d in datasets}
folders = datasets_dict.keys()
files = [f for folder in folders for f in os.listdir(folder)]
images_repeats = {folder: (len([f for f in os.listdir(folder) if f.lower().endswith(supported_types)]), datasets_dict[folder]) for folder in folders}
else:
folders = [images_folder]
files = os.listdir(images_folder)
images_repeats = {images_folder: (len([f for f in files if f.lower().endswith(supported_types)]), num_repeats)}

for folder in folders:
if not os.path.exists(folder):
print(f"💥 Error: The folder {folder.replace('/content/drive/', '')} doesn't exist.")
return
for folder, (img, rep) in images_repeats.items():
if not img:
print(f"💥 Error: Your {folder.replace('/content/drive/', '')} folder is empty.")
return
for f in files:
if not f.lower().endswith(".txt") and not f.lower().endswith(supported_types):
print(f"💥 Error: Invalid file in dataset: "{f}". Aborting.")
return

if not [txt for txt in files if txt.lower().endswith(".txt")]:
caption_extension = ""
if continue_from_lora and not (continue_from_lora.endswith(".safetensors") and os.path.exists(continue_from_lora)):
print(f"💥 Error: Invalid path to existing Lora. Example: /content/drive/MyDrive/Loras/example.safetensors")
return

pre_steps_per_epoch = sum(imgrep for (img, rep) in images_repeats.values())
steps_per_epoch = pre_steps_per_epoch/train_batch_size
total_steps = max_train_steps or int(max_train_epochs
steps_per_epoch)
estimated_epochs = int(total_steps/steps_per_epoch)
lr_warmup_steps = int(total_steps*lr_warmup_ratio)

for folder, (img, rep) in images_repeats.items():
print("📁"+folder.replace("/content/drive/", ""))
print(f"📈 Found {img} images with {rep} repeats, equaling {img*rep} steps.")
print(f"📉 Divide {pre_steps_per_epoch} steps by {train_batch_size} batch size to get {steps_per_epoch} steps per epoch.")
if max_train_epochs:
print(f"🔮 There will be {max_train_epochs} epochs, for around {total_steps} total training steps.")
else:
print(f"🔮 There will be {total_steps} steps, divided into {estimated_epochs} epochs and then some.")

if total_steps > 10000:
print("💥 Error: Your total steps are too high. You probably made a mistake. Aborting...")
return

if adjust_tags:
print(f"\n📎 Weighted tags: {'ON' if weighted_captions else 'OFF'}")
if weighted_captions:
print(f"📎 Will use {keep_tokens_weight} weight on {keep_tokens} activation tag(s)")
print("📎 Adjusting tags...")
adjust_weighted_tags(folders, keep_tokens, keep_tokens_weight, weighted_captions)

return True

def adjust_weighted_tags(folders, keep_tokens: int, keep_tokens_weight: float, weighted_captions: bool):
weighted_tag = re.compile(r"((.+?):[.\d]+)(,|$)")
for folder in folders:
for txt in [f for f in os.listdir(folder) if f.lower().endswith(".txt")]:
with open(os.path.join(folder, txt), 'r') as f:
content = f.read()
# reset previous changes
content = content.replace('\', '')
content = weighted_tag.sub(r'\1\2', content)
if weighted_captions:
# re-apply changes
content = content.replace(r'(', r'(').replace(r')', r')').replace(r':', r':')
if keep_tokens_weight > 1:
tags = [s.strip() for s in content.split(",")]
for i in range(min(keep_tokens, len(tags))):
tags[i] = f'({tags[i]}:{keep_tokens_weight})'
content = ", ".join(tags)
with open(os.path.join(folder, txt), 'w') as f:
f.write(content)

def create_config():
global dataset_config_file, config_file, model_file

if resume:
resume_points = [f.path for f in os.scandir(output_folder) if f.is_dir()]
resume_points.sort()
last_resume_point = resume_points[-1] if resume_points else None
else:
last_resume_point = None

if override_config_file:
config_file = override_config_file
print(f"\n⭕ Using custom config file {config_file}")
else:
config_dict = {
"additional_network_arguments": {
"unet_lr": unet_lr,
"text_encoder_lr": text_encoder_lr,
"network_dim": network_dim,
"network_alpha": network_alpha,
"network_module": network_module,
"network_args": network_args,
"network_train_unet_only": True if text_encoder_lr == 0 else None,
"network_weights": continue_from_lora if continue_from_lora else None
},
"optimizer_arguments": {
"learning_rate": unet_lr,
"lr_scheduler": lr_scheduler,
"lr_scheduler_num_cycles": lr_scheduler_num_cycles if lr_scheduler == "cosine_with_restarts" else None,
"lr_scheduler_power": lr_scheduler_power if lr_scheduler == "polynomial" else None,
"lr_warmup_steps": lr_warmup_steps if lr_scheduler != "constant" else None,
"optimizer_type": optimizer,
"optimizer_args": optimizer_args if optimizer_args else None,
},
"training_arguments": {
"max_train_steps": max_train_steps,
"max_train_epochs": max_train_epochs,
"save_every_n_epochs": save_every_n_epochs,
"save_last_n_epochs": keep_only_last_n_epochs,
"train_batch_size": train_batch_size,
"noise_offset": None,
"clip_skip": 2,
"min_snr_gamma": min_snr_gamma_value,
"weighted_captions": weighted_captions,
"seed": 42,
"max_token_length": 225,
"xformers": True,
"lowram": COLAB,
"max_data_loader_n_workers": 8,
"persistent_data_loader_workers": True,
"save_precision": "fp16",
"mixed_precision": "fp16",
"output_dir": output_folder,
"logging_dir": log_folder,
"output_name": project_name,
"log_prefix": project_name,
"save_state": save_state,
"save_last_n_epochs_state": 1 if save_state else None,
"resume": last_resume_point
},
"model_arguments": {
"pretrained_model_name_or_path": model_file,
"v2": custom_model_is_based_on_sd2,
"v_parameterization": True if custom_model_is_based_on_sd2 else None,
},
"saving_arguments": {
"save_model_as": "safetensors",
},
"dreambooth_arguments": {
"prior_loss_weight": 1.0,
},
"dataset_arguments": {
"cache_latents": True,
},
}

for key in config_dict:
  if isinstance(config_dict[key], dict):
    config_dict[key] = {k: v for k, v in config_dict[key].items() if v is not None}

with open(config_file, "w") as f:
  f.write(toml.dumps(config_dict))
print(f"\n📄 Config saved to {config_file}")

if override_dataset_config_file:
dataset_config_file = override_dataset_config_file
print(f"⭕ Using custom dataset config file {dataset_config_file}")
else:
dataset_config_dict = {
"general": {
"resolution": resolution,
"shuffle_caption": shuffle_caption,
"keep_tokens": keep_tokens,
"flip_aug": flip_aug,
"caption_extension": caption_extension,
"enable_bucket": True,
"bucket_reso_steps": 64,
"bucket_no_upscale": False,
"min_bucket_reso": 320 if resolution > 640 else 256,
"max_bucket_reso": 1280 if resolution > 640 else 1024,
},
"datasets": toml.loads(custom_dataset)["datasets"] if custom_dataset else [
{
"subsets": [
{
"num_repeats": num_repeats,
"image_dir": images_folder,
"class_tokens": None if caption_extension else project_name
}
]
}
]
}

for key in dataset_config_dict:
  if isinstance(dataset_config_dict[key], dict):
    dataset_config_dict[key] = {k: v for k, v in dataset_config_dict[key].items() if v is not None}

with open(dataset_config_file, "w") as f:
  f.write(toml.dumps(dataset_config_dict))
print(f"📄 Dataset config saved to {dataset_config_file}")

def download_model():
global old_model_url, model_url, model_file
real_model_url = model_url.strip()

if real_model_url.lower().endswith((".ckpt", ".safetensors")):
model_file = f"/content{real_model_url[real_model_url.rfind('/'):]}"
else:
model_file = "/content/downloaded_model.safetensors"
if os.path.exists(model_file):
!rm "{model_file}"

if m := re.search(r"(?:https?://)?(?:www.)?huggingface.co/[^/]+/[^/]+/blob", model_url):
real_model_url = real_model_url.replace("blob", "resolve")
elif m := re.search(r"(?:https?://)?(?:www.)?civitai.com/models/([0-9]+)", model_url):
real_model_url = f"https://civitai.com/api/download/models/{m.group(1)}"

!aria2c "{real_model_url}" --console-log-level=warn -c -s 16 -x 16 -k 10M -d / -o "{model_file}"

if model_file.lower().endswith(".safetensors"):
from safetensors.torch import load_file as load_safetensors
try:
test = load_safetensors(model_file)
del test
except Exception as e:
#if "HeaderTooLarge" in str(e):
new_model_file = os.path.splitext(model_file)[0]+".ckpt"
!mv "{model_file}" "{new_model_file}"
model_file = new_model_file
print(f"Renamed model to {os.path.splitext(model_file)[0]}.ckpt")

if model_file.lower().endswith(".ckpt"):
from torch import load as load_ckpt
try:
test = load_ckpt(model_file)
del test
except Exception as e:
return False

return True

def main():
global dependencies_installed

if COLAB and not os.path.exists('/content/drive'):
from google.colab import drive
print("📂 Connecting to Google Drive...")
drive.mount('/content/drive')

for dir in (main_dir, deps_dir, repo_dir, log_folder, images_folder, output_folder, config_folder):
os.makedirs(dir, exist_ok=True)

if not validate_dataset():
return

if not dependencies_installed:
print("\n🏭 Installing dependencies...\n")
t0 = time()
install_dependencies()
t1 = time()
dependencies_installed = True
print(f"\n✅ Installation finished in {int(t1-t0)} seconds.")
else:
print("\n✅ Dependencies already installed.")

if old_model_url != model_url or not model_file or not os.path.exists(model_file):
print("\n🔄 Downloading model...")
if not download_model():
print("\n💥 Error: The model you selected is invalid or corrupted, or couldn't be downloaded. You can use a civitai or huggingface link, or any direct download link.")
return
print()
else:
print("\n🔄 Model already downloaded.\n")

create_config()

print("\n⭐ Starting trainer...\n")
os.chdir(repo_dir)

!accelerate launch --config_file={accelerate_config_file} --num_cpu_threads_per_process=1 train_network.py --dataset_config={dataset_config_file} --config_file={config_file}

if not get_ipython().dict['user_ns']['_exit_code']:
display(Markdown("### ✅ Done! Go download your Lora(s) from Google Drive"))

main()

Expected behavior

⭐ Starting trainer...

Loading settings from /content/drive/MyDrive/Loras/koling/training_config.toml...
/content/drive/MyDrive/Loras/koling/training_config
prepare tokenizer
Downloading (…)olve/main/vocab.json: 100% 961k/961k [00:00<00:00, 9.35MB/s]
Downloading (…)olve/main/merges.txt: 100% 525k/525k [00:00<00:00, 20.7MB/s]
Downloading (…)cial_tokens_map.json: 100% 389/389 [00:00<00:00, 1.92MB/s]
Downloading (…)okenizer_config.json: 100% 905/905 [00:00<00:00, 4.32MB/s]
update token length: 225
Load dataset config from /content/drive/MyDrive/Loras/koling/dataset_config.toml
prepare images.
found directory /content/drive/MyDrive/Loras/koling/dataset contains 41 image files
410 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 2
resolution: (512, 512)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: False

[Subset 0 of Dataset 0]
image_dir: "/content/drive/MyDrive/Loras/koling/dataset"
image_count: 41
num_repeats: 10
shuffle_caption: True
keep_tokens: 1
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: None
caption_extension: .txt

[Dataset 0]
loading image sizes.
100% 41/41 [00:09<00:00, 4.16it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 410
mean ar error (without repeats): 0.0
prepare accelerator
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint
loading u-net:
loading vae:
Downloading (…)lve/main/config.json: 100% 4.52k/4.52k [00:00<00:00, 16.0MB/s]
Downloading pytorch_model.bin: 100% 1.71G/1.71G [00:17<00:00, 97.1MB/s]
loading text encoder:
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100% 41/41 [00:12<00:00, 3.38it/s]
import network module: networks.lora
create LoRA network. base dim (rank): 16, alpha: 8
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
use 8-bit AdamW optimizer | {}
override steps. steps for 10 epochs is / 指定エポックまでのステップ数: 2050
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 410
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 205
num epochs / epoch数: 10
batch size per device / バッチサイズ: 2
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 2050
steps: 0% 0/2050 [00:00<?, ?it/s]epoch 1/10
steps: 10% 205/2050 [01:58<17:46, 1.73it/s, loss=0.0947]saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-01.safetensors
epoch 2/10
steps: 20% 410/2050 [03:56<15:46, 1.73it/s, loss=0.0831]saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-02.safetensors
epoch 3/10
steps: 30% 615/2050 [05:54<13:47, 1.73it/s, loss=0.074] saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-03.safetensors
epoch 4/10
steps: 40% 820/2050 [07:53<11:49, 1.73it/s, loss=0.0727]saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-04.safetensors
epoch 5/10
steps: 50% 1025/2050 [09:51<09:51, 1.73it/s, loss=0.0798]saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-05.safetensors
epoch 6/10
steps: 60% 1230/2050 [11:49<07:53, 1.73it/s, loss=0.0752]saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-06.safetensors
epoch 7/10
steps: 70% 1435/2050 [13:48<05:54, 1.73it/s, loss=0.0762]saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-07.safetensors
epoch 8/10
steps: 80% 1640/2050 [15:47<03:56, 1.73it/s, loss=0.0774]saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-08.safetensors
epoch 9/10
steps: 90% 1845/2050 [17:45<01:58, 1.73it/s, loss=0.0679]saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-09.safetensors
epoch 10/10
steps: 100% 2050/2050 [19:44<00:00, 1.73it/s, loss=0.0709]saving checkpoint: /content/drive/MyDrive/Loras/koling/output/koling-10.safetensors
model saved.
steps: 100% 2050/2050 [19:44<00:00, 1.73it/s, loss=0.0709]
✅ Done! Go download your Lora(s) from Google Drive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant