Vram exceeded when training model. How to fix it? #1788
Replies: 1 comment 5 replies
-
Are you using the latest notebook ? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Progress:| | 0% 3/3000 [00:10<2:18:26, 2.77s/it, loss=0.317, lr=2e-6] dramine Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 690, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/utils/operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.9/dist-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_condition.py", line 632, in forward
sample = upsample_block(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_blocks.py", line 1813, in forward
hidden_states = attn(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/transformer_2d.py", line 265, in forward
hidden_states = block(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 324, in forward
ff_output = self.ff(norm_hidden_states)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 382, in forward
hidden_states = module(hidden_states)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/diffusers/models/attention.py", line 428, in forward
hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 14.75 GiB total capacity; 13.25 GiB already allocated; 50.81 MiB free; 13.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Progress:| | 0% 3/3000 [00:12<3:30:12, 4.21s/it, loss=0.317, lr=2e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/anime-diffusion', '--pretrained_model_name_or_path=/content/stable-diffusion-v2-768', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/anime-diffusion/instance_images', '--output_dir=/content/models/anime-diffusion', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/anime-diffusion/captions', '--instance_prompt=', '--seed=846813', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=3000']' returned non-zero exit status 1.
Something went wrong
Beta Was this translation helpful? Give feedback.
All reactions