Vram exceeded when training model. How to fix it? #1788

InternalMegaT · 2023-03-19T02:29:12Z

InternalMegaT
Mar 19, 2023

Progress:| | 0% 3/3000 [00:10<2:18:26, 2.77s/it, loss=0.317, lr=2e-6] dramine Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 690, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/utils/operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.9/dist-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_condition.py", line 632, in forward
sample = upsample_block(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_blocks.py", line 1813, in forward
hidden_states = attn(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/transformer_2d.py", line 265, in forward
hidden_states = block(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 324, in forward
ff_output = self.ff(norm_hidden_states)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 382, in forward
hidden_states = module(hidden_states)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 428, in forward
hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 14.75 GiB total capacity; 13.25 GiB already allocated; 50.81 MiB free; 13.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Progress:| | 0% 3/3000 [00:12<3:30:12, 4.21s/it, loss=0.317, lr=2e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/anime-diffusion', '--pretrained_model_name_or_path=/content/stable-diffusion-v2-768', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/anime-diffusion/instance_images', '--output_dir=/content/models/anime-diffusion', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/anime-diffusion/captions', '--instance_prompt=', '--seed=846813', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=3000']' returned non-zero exit status 1.
Something went wrong

TheLastBen · 2023-03-19T06:33:19Z

TheLastBen
Mar 19, 2023
Maintainer

Are you using the latest notebook ?

5 replies

InternalMegaT Mar 19, 2023
Author

I was using the latest notebook, yes.

InternalMegaT Mar 19, 2023
Author

It said the error was caused by high vram. So I though it was my database being to massive.

InternalMegaT Mar 19, 2023
Author

However this is not the case because I switched to a database with 10 images.

InternalMegaT Mar 19, 2023
Author

0% 0/3000 [00:00<?, ?it/s] spritesheet spritesheet Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 668, in main
latents = vae.encode(batch["pixel_values"].to(dtype=weight_dtype)).latent_dist.sample()
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/autoencoder_kl.py", line 158, in encode
h = self.encoder(x)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/vae.py", line 105, in forward
sample = down_block(sample)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_blocks.py", line 984, in forward
hidden_states = resnet(hidden_states, temb=None)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/resnet.py", line 580, in forward
output_tensor = (input_tensor + hidden_states) / self.output_scale_factor
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.42 GiB (GPU 0; 14.75 GiB total capacity; 11.47 GiB already allocated; 1.94 GiB free; 11.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0% 0/3000 [00:10<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/spritesheet', '--pretrained_model_name_or_path=/content/stable-diffusion-v2-768', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/spritesheet/instance_images', '--output_dir=/content/models/spritesheet', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/spritesheet/captions', '--instance_prompt=', '--seed=783547', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=3000']' returned non-zero exit status 1.
Something went wrong

InternalMegaT Mar 19, 2023
Author

I was not expecting this to be a bug sorry I added this here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vram exceeded when training model. How to fix it? #1788

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Vram exceeded when training model. How to fix it? #1788

Uh oh!

InternalMegaT Mar 19, 2023

Replies: 1 comment · 5 replies

Uh oh!

TheLastBen Mar 19, 2023 Maintainer

Uh oh!

InternalMegaT Mar 19, 2023 Author

Uh oh!

InternalMegaT Mar 19, 2023 Author

Uh oh!

InternalMegaT Mar 19, 2023 Author

Uh oh!

InternalMegaT Mar 19, 2023 Author

Uh oh!

InternalMegaT Mar 19, 2023 Author

InternalMegaT
Mar 19, 2023

Replies: 1 comment 5 replies

TheLastBen
Mar 19, 2023
Maintainer

InternalMegaT Mar 19, 2023
Author

InternalMegaT Mar 19, 2023
Author

InternalMegaT Mar 19, 2023
Author

InternalMegaT Mar 19, 2023
Author

InternalMegaT Mar 19, 2023
Author