-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
On default settings provided in SD3 controlnet example, with 2 validation images training will error out with out of memory during validation on single A100 80GB.
04/07/2025 21:15:15 - INFO - __main__ - ***** Running training *****
04/07/2025 21:15:15 - INFO - __main__ - Num examples = 10000
04/07/2025 21:15:15 - INFO - __main__ - Num batches each epoch = 10000
04/07/2025 21:15:15 - INFO - __main__ - Num Epochs = 2
04/07/2025 21:15:15 - INFO - __main__ - Instantaneous batch size per device = 1
04/07/2025 21:15:15 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 4
04/07/2025 21:15:15 - INFO - __main__ - Gradient Accumulation steps = 4
04/07/2025 21:15:15 - INFO - __main__ - Total optimization steps = 4000
Steps: 0%| | 5/4000 [00:21<4:38:36, 4.18s/it, loss=0.00669, lr=1e-5]04/07/2025 21:15:36 - INFO - __main__ - Running validation...
{'controlnet', 'image_encoder', 'feature_extractor'} was not found in config. Values will be initialized to default values. Keyword arguments {'safety_checker': None} are not expected by StableDiffusion3ControlNetPipeline and will be ignored. Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of stabilityai/stable-diffusion-3.5-medium.e components...: 0%| | 0/9 [00:00<?, ?it/s] {'invert_sigmas', 'base_shift', 'base_image_seq_len', 'use_dynamic_shifting', 'shift_terminal', 'time_shift_type', 'use_exponential_sigmas', 'max_shift', 'use
_karras_sigmas', 'max_image_seq_len', 'use_beta_sigmas'} was not found in config. Values will be initialized to default values.
Loaded scheduler as FlowMatchEulerDiscreteScheduler from `scheduler` subfolder of stabilityai/stable-diffusion-3.5-medium.
Instantiating AutoencoderKL model under default dtype torch.float32.
All model checkpoint weights were used when initializing AutoencoderKL.
All the weights of AutoencoderKL were initialized from the model checkpoint at /home/jakubdawidowicz/.cache/huggingface/hub/models--stabilityai--stable-diffus
ion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/vae.
If your task is similar to the task the model of the checkpoint was trained on, you can already use AutoencoderKL for predictions without further training.
Loaded vae as AutoencoderKL from `vae` subfolder of stabilityai/stable-diffusion-3.5-medium.
Loaded text_encoder as CLIPTextModelWithProjection from `text_encoder` subfolder of stabilityai/stable-diffusion-3.5-medium. | 3/9 [00:00<00:00, 13.70it/s]
Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of stabilityai/stable-diffusion-3.5-medium.
Loaded text_encoder_2 as CLIPTextModelWithProjection from `text_encoder_2` subfolder of stabilityai/stable-diffusion-3.5-medium.| 5/9 [00:00<00:00, 8.40it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:12<00:00, 6.02s/it]
Loaded text_encoder_3 as T5EncoderModel from `text_encoder_3` subfolder of stabilityai/stable-diffusion-3.5-medium.
Loading checkpoint shards: 100%|██████████| 2/2 [00:12<00:00, 5.98s/it] Loaded tokenizer_3 as T5TokenizerFast from `tokenizer_3` subfolder of stabilityai/stable-diffusion-3.5-medium..: 78%|███████▊ | 7/9 [00:14<00:07, 3.66s/it]
Instantiating SD3Transformer2DModel model under default dtype torch.float32.
All model checkpoint weights were used when initializing SD3Transformer2DModel.
All the weights of SD3Transformer2DModel were initialized from the model checkpoint at /home/jakubdawidowicz/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/transformer.
If your task is similar to the task the model of the checkpoint was trained on, you can already use SD3Transformer2DModel for predictions without further training.
Loaded transformer as SD3Transformer2DModel from `transformer` subfolder of stabilityai/stable-diffusion-3.5-medium.
Loading pipeline components...: 100%|██████████| 9/9 [00:26<00:00, 3.00s/it]
Traceback (most recent call last):0%|██████████| 9/9 [00:26<00:00, 5.44s/it]
File "/home/jakubdawidowicz/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1429, in <module>
main(args)
File "/home/jakubdawidowicz/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1377, in main
image_logs = log_validation(
^^^^^^^^^^^^^^^
File "/home/jakubdawidowicz/diffusers/examples/controlnet/train_controlnet_sd3.py", line 83, in log_validation
pipeline = pipeline.to(torch.device(accelerator.device))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/diffusers/src/diffusers/pipelines/pipeline_utils.py", line 482, in to
module.to(device, dtype)
File "/opt/diffusers/src/diffusers/models/modeling_utils.py", line 1351, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1343, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 903, in _apply
module._apply(fn)
File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 903, in _apply
module._apply(fn)
File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 903, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 930, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/opt/miniconda/envs/control/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1329, in convert
return t.to(
^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU 0 has a total capacity of 79.25 GiB of which 4.75 MiB is free. Including non-PyTorch memory, this process has 79.24 GiB memory in use. Of the allocated memory 76.93 GiB is allocated by PyTorch, and 1.81 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Reproduction
Running SD3 controlnet example (adjusted dataset size and validation steps to decrease job time).
export MODEL_DIR="stabilityai/stable-diffusion-3.5-medium"
export OUTPUT_DIR="sd3-controlnet-out"
accelerate launch train_controlnet_sd3.py \
--pretrained_model_name_or_path=$MODEL_DIR \
--output_dir=$OUTPUT_DIR \
--train_data_dir="fill50k" \
--resolution=1024 \
--learning_rate=1e-5 \
--max_train_samples=10000 \
--max_train_steps=4000 \
--checkpointing_steps=500 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--validation_steps=5 \
--train_batch_size=1 \
--gradient_accumulation_steps=4
Logs
System Info
System Info
- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Linux-6.8.0-53-generic-x86_64-with-glibc2.39
- Running on Google Colab?: No
- Python version: 3.12.9
- PyTorch version (GPU?): 2.6.0+cu124
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.29.3
- Transformers version: 4.50.3
- Accelerate version: 1.5.2
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.5.3
- xFormers version: not installed
- Accelerator: NVIDIA A100 80GB PCIe, 81920 MiB
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working