-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Description
Describe the bug
As soon as the training is done and the code wants to clear up and do its last steps I get this error
Steps: 99%|█████████▉| 397/400 [07:45<00:03, 1.16s/it, loss=0.397, lr=1]
Steps: 100%|█████████▉| 398/400 [07:47<00:02, 1.20s/it, loss=0.397, lr=1]
Steps: 100%|█████████▉| 398/400 [07:47<00:02, 1.20s/it, loss=0.539, lr=1]
Steps: 100%|█████████▉| 399/400 [07:48<00:01, 1.19s/it, loss=0.539, lr=1]
Steps: 100%|█████████▉| 399/400 [07:48<00:01, 1.19s/it, loss=0.58, lr=1]
Steps: 100%|██████████| 400/400 [07:49<00:00, 1.18s/it, loss=0.58, lr=1]
Steps: 100%|██████████| 400/400 [07:49<00:00, 1.18s/it, loss=0.288, lr=1] Model weights saved in /workspace/output_model/dd304483-afdc-4398-9c46-c660d0725e70-e1/pytorch_lora_weights.safetensors
2025-02-19T21:38:48.866518894Z Traceback (most recent call last):
2025-02-19T21:38:48.866557263Z File "/workspace/./train_dreambooth_lora_flux.py", line 1935, in
2025-02-19T21:38:48.867054758Z main(args)
2025-02-19T21:38:48.867072609Z File "/workspace/./train_dreambooth_lora_flux.py", line 1887, in main
2025-02-19T21:38:48.867457265Z pipeline = FluxPipeline.from_pretrained(
2025-02-19T21:38:48.867479814Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-19T21:38:48.867487574Z File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
2025-02-19T21:38:48.867554504Z return fn(*args, **kwargs)
2025-02-19T21:38:48.867601603Z ^^^^^^^^^^^^^^^^^^^
2025-02-19T21:38:48.867606703Z File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 793, in from_pretrained
2025-02-19T21:38:48.867905410Z config_dict = cls.load_config(cached_folder, dduf_entries=dduf_entries)
2025-02-19T21:38:48.867973610Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-19T21:38:48.867992349Z File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
2025-02-19T21:38:48.868030829Z return fn(*args, **kwargs)
2025-02-19T21:38:48.868039849Z ^^^^^^^^^^^^^^^^^^^
2025-02-19T21:38:48.868053699Z File "/usr/local/lib/python3.11/dist-packages/diffusers/configuration_utils.py", line 381, in load_config
2025-02-19T21:38:48.868183318Z raise EnvironmentError(
2025-02-19T21:38:48.868199778Z OSError: Error no file named model_index.json found in directory /workspace/model/realflux1.
2025-02-19T21:38:49.009733209Z
Steps: 100%|██████████| 400/400 [07:49<00:00, 1.17s/it, loss=0.288, lr=1]
2025-02-19T21:38:50.343330576Z Traceback (most recent call last):
2025-02-19T21:38:50.343379125Z File "/usr/local/bin/accelerate", line 8, in
2025-02-19T21:38:50.343568443Z sys.exit(main())
2025-02-19T21:38:50.343608783Z ^^^^^^
2025-02-19T21:38:50.343691572Z File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
2025-02-19T21:38:50.343770471Z args.func(args)
2025-02-19T21:38:50.343896080Z File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 1106, in launch_command
2025-02-19T21:38:50.344200407Z simple_launcher(args)
2025-02-19T21:38:50.344262447Z File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 704, in simple_launcher
2025-02-19T21:38:50.344583144Z raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
2025-02-19T21:38:50.819696870Z ✅ FLUX LoRA Training abgeschlossen!
So basically the error says OSError: Error no file named model_index.json found in directory /workspace/model/realflux1. File "/workspace/./train_dreambooth_lora_flux.py", line 1887, in main
2025-02-19T21:38:48.867457265Z pipeline = FluxPipeline.from_pretrained(
But the path exists and the model in it. It is the same path that I started the training with it and it found the files/works.
It still looks like it finished the training because it saved a 98MB big .safetensors file and a log... but i have the feeling the LoRA is broken because when I try to load it, the inference output ist corrupted.
Tried inference without loading the LoRA:
As soon as loading the LoRA (i even tried different prompts that dont have todo anything with the LoRA)
Reproduction
Dreambooth Training:
Using the newest train_dreambooth_lora_flux.py, started with these parameters (of course with accelerate): ./train_dreambooth_lora_flux.py', '--pretrained_model_name_or_path', '/workspace/model/realflux1', '--instance_data_dir', '/workspace/job_files/dd304483-afdc-4398-9c46-e1/clean_data', '--output_dir', '/workspace/output_model/dd304483-afdc-4398-9c46-e1', '--instance_prompt', 'photo of WIXBSAHA black car', '--resolution', '768', '--learning_rate', '1.0', '--mixed_precision', 'bf16', '--lr_warmup_steps', '0', '--gradient_accumulation_steps', '1', '--lr_scheduler', 'constant', '--train_batch_size', '1', '--max_train_steps', '400', '--checkpointing_steps', '500', '--num_train_epochs', '10', '--checkpoints_total_limit', '1', '--train_text_encoder', '--rank', '16', '--optimizer', 'prodigy', '--repeats', '3', '--guidance_scale', '1'
Inference:
print("Lade FLUX Modell")
pipe = FluxPipeline.from_pretrained(model_path, torch_dtype=torch.bfloat16).to("cuda")
pipe.enable_model_cpu_offload()
generator = None
if lora_path:
print(f"🔄 Lade FLUX LoRA-Modell: {lora_path}")
pipe.load_lora_weights(lora_path)
print("✅ LoRA geladen.")
if seed is not None:
generator = torch.Generator(device="cpu").manual_seed(seed)
image = pipe(
prompt=prompt,
guidance_scale=guidance_scale, #0.
negative_prompt=negative_prompt,
height=height,
true_cfg_scale=true_cfg_scale,
width=width,
num_inference_steps=num_inference_steps,
max_sequence_length=max_sequence_length, #256
generator=generator
).images[0]
image.save("test.png")
I also get a warning on inference:
✅ LoRA geladen.\n
[info]🔄 Lade FLUX LoRA-Modell: /workspace/lora_output_model\n
[info]\rLoading pipeline components...: 57%|█████▋ | 4/7 [00:00<00:00, 4.14it/s]\rLoading pipeline components...: 71%|███████▏ | 5/7 [00:01<00:00, 2.34it/s]\rLoading pipeline components...: 100%|██████████| 7/7 [00:01<00:00, 4.16it/s]\n
[info]\rLoading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 7.39it/s]�[A\rLoading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 7.37it/s]\n
[info]\rLoading checkpoint shards: 50%|█████ | 1/2 [00:00<00:00, 7.33it/s]�[A\n
[info]\rLoading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]�[A\n
[info]\rLoading pipeline components...: 43%|████▎ | 3/7 [00:00<00:00, 5.06it/s]\n
[info]\rLoading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]\rLoading pipeline components...: 14%|█▍ | 1/7 [00:00<00:00, 7.55it/s]\rLoading pipeline components...: 29%|██▊ | 2/7 [00:00<00:00, 5.91it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers\n
[info]Lade FLUX Modell\n
[info]FLUX Inference\n
(read from bottom to top)
this one: You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers\n
Logs
System Info
A100 80GB VRAM 120GB RAM
pytorch:2.4.0-py3.11-cuda12.4.1
CUDA 12.4
accelerate==0.33.0