Dreambooth LoRA Flux training last step error

### Describe the bug

As soon as the training is done and the code wants to clear up and do its last steps I get this error

Steps:  99%|█████████▉| 397/400 [07:45<00:03,  1.16s/it, loss=0.397, lr=1]                                                             
Steps: 100%|█████████▉| 398/400 [07:47<00:02,  1.20s/it, loss=0.397, lr=1]                                                    
Steps: 100%|█████████▉| 398/400 [07:47<00:02,  1.20s/it, loss=0.539, lr=1]                                                    
Steps: 100%|█████████▉| 399/400 [07:48<00:01,  1.19s/it, loss=0.539, lr=1]                                                    
Steps: 100%|█████████▉| 399/400 [07:48<00:01,  1.19s/it, loss=0.58, lr=1]                                                   
Steps: 100%|██████████| 400/400 [07:49<00:00,  1.18s/it, loss=0.58, lr=1]                                                  
Steps: 100%|██████████| 400/400 [07:49<00:00,  1.18s/it, loss=0.288, lr=1]                                                  Model weights saved in /workspace/output_model/dd304483-afdc-4398-9c46-c660d0725e70-e1/pytorch_lora_weights.safetensors
2025-02-19T21:38:48.866518894Z Traceback (most recent call last):
2025-02-19T21:38:48.866557263Z   File "/workspace/./train_dreambooth_lora_flux.py", line 1935, in <module>
2025-02-19T21:38:48.867054758Z     main(args)
2025-02-19T21:38:48.867072609Z   File "/workspace/./train_dreambooth_lora_flux.py", line 1887, in main
2025-02-19T21:38:48.867457265Z     pipeline = FluxPipeline.from_pretrained(
2025-02-19T21:38:48.867479814Z                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-19T21:38:48.867487574Z   File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
2025-02-19T21:38:48.867554504Z     return fn(*args, **kwargs)
2025-02-19T21:38:48.867601603Z            ^^^^^^^^^^^^^^^^^^^
2025-02-19T21:38:48.867606703Z   File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 793, in from_pretrained
2025-02-19T21:38:48.867905410Z     config_dict = cls.load_config(cached_folder, dduf_entries=dduf_entries)
2025-02-19T21:38:48.867973610Z                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-19T21:38:48.867992349Z   File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
2025-02-19T21:38:48.868030829Z     return fn(*args, **kwargs)
2025-02-19T21:38:48.868039849Z            ^^^^^^^^^^^^^^^^^^^
2025-02-19T21:38:48.868053699Z   File "/usr/local/lib/python3.11/dist-packages/diffusers/configuration_utils.py", line 381, in load_config
2025-02-19T21:38:48.868183318Z     raise EnvironmentError(
2025-02-19T21:38:48.868199778Z OSError: Error no file named model_index.json found in directory /workspace/model/realflux1.
2025-02-19T21:38:49.009733209Z 
Steps: 100%|██████████| 400/400 [07:49<00:00,  1.17s/it, loss=0.288, lr=1]
2025-02-19T21:38:50.343330576Z Traceback (most recent call last):
2025-02-19T21:38:50.343379125Z   File "/usr/local/bin/accelerate", line 8, in <module>
2025-02-19T21:38:50.343568443Z     sys.exit(main())
2025-02-19T21:38:50.343608783Z              ^^^^^^
2025-02-19T21:38:50.343691572Z   File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
2025-02-19T21:38:50.343770471Z     args.func(args)
2025-02-19T21:38:50.343896080Z   File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 1106, in launch_command
2025-02-19T21:38:50.344200407Z     simple_launcher(args)
2025-02-19T21:38:50.344262447Z   File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 704, in simple_launcher
2025-02-19T21:38:50.344583144Z     raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
2025-02-19T21:38:50.819696870Z ✅ FLUX LoRA Training abgeschlossen!

**So basically the error says OSError**_: Error no file named model_index.json found in directory /workspace/model/realflux1. File "/workspace/./train_dreambooth_lora_flux.py", line 1887, in main
2025-02-19T21:38:48.867457265Z     pipeline = FluxPipeline.from_pretrained(_

But the path exists and the model in it. It is the same path that I started the training with it and it found the files/works.

It still looks like it finished the training because it saved a 98MB big .safetensors file and a log... but i have the feeling the LoRA is broken because when I try to load it, the inference output ist corrupted. 

Tried inference without loading the LoRA:

![Image](https://github.com/user-attachments/assets/1dbd79f7-58ae-45eb-b837-986b24d9cf2d)

As soon as loading the LoRA (i even tried different prompts that dont have todo anything with the LoRA)

![Image](https://github.com/user-attachments/assets/23861899-ac86-4c5f-b7b7-c87db5d20b39)

![Image](https://github.com/user-attachments/assets/78a1de24-e1ac-4588-94c8-b971c70b417f)

### Reproduction

**Dreambooth Training:**
Using the newest train_dreambooth_lora_flux.py, started with these parameters (of course with accelerate): ./train_dreambooth_lora_flux.py', '--pretrained_model_name_or_path', '/workspace/model/realflux1', '--instance_data_dir', '/workspace/job_files/dd304483-afdc-4398-9c46-e1/clean_data', '--output_dir', '/workspace/output_model/dd304483-afdc-4398-9c46-e1', '--instance_prompt', 'photo of WIXBSAHA black car', '--resolution', '768', '--learning_rate', '1.0', '--mixed_precision', 'bf16', '--lr_warmup_steps', '0', '--gradient_accumulation_steps', '1', '--lr_scheduler', 'constant', '--train_batch_size', '1', '--max_train_steps', '400', '--checkpointing_steps', '500', '--num_train_epochs', '10', '--checkpoints_total_limit', '1', '--train_text_encoder', '--rank', '16', '--optimizer', 'prodigy', '--repeats', '3', '--guidance_scale', '1'

**Inference:**
```
print("Lade FLUX Modell")
    pipe = FluxPipeline.from_pretrained(model_path, torch_dtype=torch.bfloat16).to("cuda")
    pipe.enable_model_cpu_offload()

    generator = None

    if lora_path:
        print(f"🔄 Lade FLUX LoRA-Modell: {lora_path}")
        pipe.load_lora_weights(lora_path)
        print("✅ LoRA geladen.")
    
    if seed is not None:
        generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipe(
        prompt=prompt,
        guidance_scale=guidance_scale, #0.
        negative_prompt=negative_prompt,
        height=height,
        true_cfg_scale=true_cfg_scale,
        width=width,
        num_inference_steps=num_inference_steps,
        max_sequence_length=max_sequence_length, #256
        generator=generator
    ).images[0]

image.save("test.png")
```

I also get a warning on inference:

✅ LoRA geladen.\n
[info]🔄 Lade FLUX LoRA-Modell: /workspace/lora_output_model\n
[info]\rLoading pipeline components...:  57%|█████▋    | 4/7 [00:00<00:00,  4.14it/s]\rLoading pipeline components...:  71%|███████▏  | 5/7 [00:01<00:00,  2.34it/s]\rLoading pipeline components...: 100%|██████████| 7/7 [00:01<00:00,  4.16it/s]\n
[info]\rLoading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  7.39it/s][A\rLoading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  7.37it/s]\n
[info]\rLoading checkpoint shards:  50%|█████     | 1/2 [00:00<00:00,  7.33it/s][A\n
[info]\rLoading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s][A\n
[info]\rLoading pipeline components...:  43%|████▎     | 3/7 [00:00<00:00,  5.06it/s]\n
[info]\rLoading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]\rLoading pipeline components...:  14%|█▍        | 1/7 [00:00<00:00,  7.55it/s]\rLoading pipeline components...:  29%|██▊       | 2/7 [00:00<00:00,  5.91it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers\n
[info]Lade FLUX Modell\n
[info]FLUX Inference\n

(read from bottom to top)

this one: You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers\n

### Logs

```shell

```

### System Info

A100 80GB VRAM 120GB RAM
pytorch:2.4.0-py3.11-cuda12.4.1
CUDA 12.4
accelerate==0.33.0

### Who can help?

@sayakpaul 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dreambooth LoRA Flux training last step error #10839

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dreambooth LoRA Flux training last step error #10839

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions