Problem #5

djkakadu · 2024-11-24T16:54:58Z

Hello every image i try, i get these error:
`❌ Error during processing: File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 858, in forward
super().forward(*inputs)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)

File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 922, in forward
return super().forward(inputs) + inputs[0]
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 922, in forward
return super().forward(inputs) + inputs[0]
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)
OutOfMemoryError:
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\attentions.py", line 129, in forward
return self._process_attention(
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\attentions.py", line 29, in scaled_dot_product_attention
return _scaled_dot_product_attention(
CUDA out of memory. Tried to allocate 7.75 GiB. GPU

(CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False)
├── (PAR)
│ └── Identity() (x3)
├── (DISTR)
│ └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16) (x3)
├── >>> ScaledDotProductAttention(num_heads=8) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock.Residual_1.SelfAttention.ScaledDotProductAttention
└── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16)
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-6.97, max=7.16, mean=-0.05, std=1.51, norm=4841.88, grad=False)
1: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-6.69, max=7.00, mean=-0.01, std=1.58, norm=5069.41, grad=False)
2: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-2.86, max=2.66, mean=0.01, std=0.46, norm=1474.05, grad=False)

(RES) Residual()
├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
└── >>> (CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock.Residual_1.SelfAttention
├── (PAR)
│ └── Identity() (x3)
├── (DISTR)
│ └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16) (x3)
├── ScaledDotProductAttention(num_heads=8)
└── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16)
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-2.91, max=2.64, mean=0.01, std=0.54, norm=1737.50, grad=False)

(CHAIN) CrossAttentionBlock(embedding_dim=320, context_embedding_dim=768, context_key=clip_text_embedding, num_heads=8, use_bias=False)
├── >>> (RES) Residual() | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock.Residual_1 #1
│ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
│ └── (CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False)
│ ├── (PAR) ...
│ ├── (DISTR) ...
│ ├── ScaledDotProductAttention(num_heads=8)
│ └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16)
├── (RES) Residual() #2
│ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
│ ├── (PAR)
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-1.66, max=1.97, mean=0.01, std=0.24, norm=780.96, grad=False)

(CHAIN)
└── >>> (CHAIN) CrossAttentionBlock(embedding_dim=320, context_embedding_dim=768, context_key=clip_text_embedding, num_heads=8, use_bias=False) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock
├── (RES) Residual() #1
│ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
│ └── (CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False) ...
├── (RES) Residual() #2
│ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
│ ├── (PAR) ...
│ └── (CHAIN) Attention(embedding_dim=320, num_heads=8, key_embedding_dim=768, value_embedding_dim=768, inner_dim=320, use_bias=False) ...
└── (RES) Residual() #3
├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-1.66, max=1.97, mean=0.01, std=0.24, norm=780.96, grad=False)

(RES) CLIPLCrossAttention(channels=320)
├── (CHAIN) #1
│ ├── GroupNorm(num_groups=32, eps=1e-06, channels=320, device=cuda:0, dtype=bfloat16)
│ ├── Conv2d(in_channels=320, out_channels=320, kernel_size=(1, 1), device=cuda:0, dtype=bfloat16)
│ ├── (CHAIN) StatefulFlatten(start_dim=2)
│ │ ├── SetContext(context=flatten, key=sizes)
│ │ └── Flatten(start_dim=2)
│ └── Transpose(dim0=1, dim1=2)
├── >>> (CHAIN) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2 #2
│ └── (CHAIN) CrossAttentionBlock(embedding_dim=320, context_embedding_dim=768, context_key=clip_text_embedding, num_heads=8, use_bias=False)
│ ├── (RES) Residual() #1 ...
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-1.66, max=1.97, mean=0.01, std=0.24, norm=780.96, grad=False)

(CHAIN)
├── (SUM) ResidualBlock(in_channels=320, out_channels=320)
│ ├── (CHAIN)
│ │ ├── GroupNorm(num_groups=32, channels=320, device=cuda:0, dtype=bfloat16) #1
│ │ ├── SiLU() #1
│ │ ├── (SUM) RangeAdapter2d(channels=320, embedding_dim=1280) ...
│ │ ├── GroupNorm(num_groups=32, channels=320, device=cuda:0, dtype=bfloat16) #2
│ │ ├── SiLU() #2
│ │ └── Conv2d(in_channels=320, out_channels=320, kernel_size=(3, 3), padding=(1, 1), device=cuda:0, dtype=bfloat16)
│ └── Identity()
├── >>> (RES) CLIPLCrossAttention(channels=320) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention
0: Tensor(shape=(2, 320, 144, 112), dtype=bfloat16, device=cuda:0, min=-8.19, max=6.47, mean=-0.09, std=0.69, norm=2236.34, grad=False)

(CHAIN) DownBlocks(in_channels=4)
├── (CHAIN) #1
│ ├── Conv2d(in_channels=4, out_channels=320, kernel_size=(3, 3), padding=(1, 1), device=cuda:0, dtype=bfloat16)
│ ├── (RES) Residual()
│ │ ├── UseContext(context=controlnet, key=condition_tile)
│ │ └── (CHAIN) ConditionEncoder() ...
│ └── (PASS)
│ ├── Conv2d(in_channels=320, out_channels=320, kernel_size=(1, 1), device=cuda:0, dtype=bfloat16)
│ └── Lambda(_store_residual(x: torch.Tensor))
├── >>> (CHAIN) (x2) | SD1UNet.Controlnet.DownBlocks.Chain_2 #2
│ ├── (SUM) ResidualBlock(in_channels=320, out_channels=320)
0: Tensor(shape=(2, 320, 144, 112), dtype=bfloat16, device=cuda:0, min=-6.19, max=6.31, mean=-0.04, std=0.67, norm=2141.49, grad=False)

(PASS) Controlnet(name=tile, scale=0.6)
├── (PASS) TimestepEncoder()
│ ├── UseContext(context=diffusion, key=timestep)
│ ├── (CHAIN) RangeEncoder(sinusoidal_embedding_dim=320, embedding_dim=1280)
│ │ ├── Lambda(compute_sinusoidal_embedding(x: jaxtyping.Int[Tensor, 'batch 1']) -> jaxtyping.Float[Tensor, 'batch 1 embedding_dim'])
│ │ ├── Converter(set_device=False)
│ │ ├── Linear(in_features=320, out_features=1280, device=cuda:0, dtype=bfloat16) #1
│ │ ├── SiLU()
│ │ └── Linear(in_features=1280, out_features=1280, device=cuda:0, dtype=bfloat16) #2
│ └── SetContext(context=range_adapter, key=timestep_embedding_tile)
├── Slicing(dim=1, end=4)
0: Tensor(shape=(2, 4, 144, 112), dtype=bfloat16, device=cuda:0, min=-3.31, max=4.03, mean=0.48, std=1.11, norm=433.98, grad=False)

(CHAIN) SD1UNet(in_channels=4)
├── >>> (PASS) Controlnet(name=tile, scale=0.6) | SD1UNet.Controlnet
│ ├── (PASS) TimestepEncoder()
│ │ ├── UseContext(context=diffusion, key=timestep)
│ │ ├── (CHAIN) RangeEncoder(sinusoidal_embedding_dim=320, embedding_dim=1280) ...
│ │ └── SetContext(context=range_adapter, key=timestep_embedding_tile)
│ ├── Slicing(dim=1, end=4)
│ ├── (CHAIN) DownBlocks(in_channels=4)
│ │ ├── (CHAIN) #1 ...
│ │ ├── (CHAIN) (x2) #2 ...
│ │ ├── (CHAIN) #3 ...
0: Tensor(shape=(2, 4, 144, 112), dtype=bfloat16, device=cuda:0, min=-3.31, max=4.03, mean=0.48, std=1.11, norm=433.98, grad=False)`

ai-anchorite · 2024-11-25T02:29:50Z

That doesn't look fun.

The fatality is:
CUDA out of memory. Tried to allocate 7.75 GiB. GPU

Are you using an older or workstation GPU?

djkakadu · 2024-11-25T06:11:15Z

I use 2060 Super 8GB

ai-anchorite · 2024-11-25T06:42:47Z

hmm. did you disable SYSMEM fallback at some point? otherwise make sure your nvidia driver is up to date. It should happily fall into shared memory and just slow down if you exceed 8GB VRAM

ai-anchorite · 2024-11-25T06:52:41Z

oh actually, i think bfloat16 was only supported from Ampere Nvidia onwards. That's worth checking into.

It was changed from float16 to bfloat16 for (phat) Mac to work better, but not helpful it it breaks it for 1000 and 2000 grn Nvidia!

it was changed in two places in this commit: 80a3efd

line 111:

clarity-refiners-ui/app/app.py

Line 111 in c0ba342

dtype = devicetorch.dtype(torch, "bfloat16")

and 126:

clarity-refiners-ui/app/app.py

Line 126 in c0ba342

torch_dtype = devicetorch.dtype(torch, "bfloat16")

change: "bfloat16" to "float16"

djkakadu · 2024-11-28T18:53:49Z

These work and my problem solved.

change: "bfloat16" to "float16"

djkakadu closed this as completed Nov 28, 2024

ai-anchorite mentioned this issue Dec 8, 2024

When running in Colab(free), the error message "CUDA out of memory" appears. #6

Open

ai-anchorite mentioned this issue Mar 25, 2025

Help with CUDA error #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem #5

Problem #5

djkakadu commented Nov 24, 2024

ai-anchorite commented Nov 25, 2024

Uh oh!

djkakadu commented Nov 25, 2024 via email •

edited

Loading

Uh oh!

ai-anchorite commented Nov 25, 2024 •

edited

Loading

Uh oh!

ai-anchorite commented Nov 25, 2024

Uh oh!

djkakadu commented Nov 28, 2024

Uh oh!

Problem #5

Problem #5

Comments

djkakadu commented Nov 24, 2024

ai-anchorite commented Nov 25, 2024

Uh oh!

djkakadu commented Nov 25, 2024 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ai-anchorite commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ai-anchorite commented Nov 25, 2024

Uh oh!

djkakadu commented Nov 28, 2024

Uh oh!

djkakadu commented Nov 25, 2024 via email •

edited

Loading

ai-anchorite commented Nov 25, 2024 •

edited

Loading