Skip to content

Problem #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
djkakadu opened this issue Nov 24, 2024 · 5 comments
Closed

Problem #5

djkakadu opened this issue Nov 24, 2024 · 5 comments

Comments

@djkakadu
Copy link

Hello every image i try, i get these error:
`❌ Error during processing: File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 858, in forward
super().forward(*inputs)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)

File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 922, in forward
return super().forward(*inputs) + inputs[0]
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 922, in forward
return super().forward(*inputs) + inputs[0]
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward
result = self._call_layer(layer, name, *intermediate_args)
OutOfMemoryError:
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\attentions.py", line 129, in forward
return self._process_attention(
File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\attentions.py", line 29, in scaled_dot_product_attention
return _scaled_dot_product_attention(
CUDA out of memory. Tried to allocate 7.75 GiB. GPU

(CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False)
├── (PAR)
│ └── Identity() (x3)
├── (DISTR)
│ └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16) (x3)
├── >>> ScaledDotProductAttention(num_heads=8) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock.Residual_1.SelfAttention.ScaledDotProductAttention
└── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16)
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-6.97, max=7.16, mean=-0.05, std=1.51, norm=4841.88, grad=False)
1: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-6.69, max=7.00, mean=-0.01, std=1.58, norm=5069.41, grad=False)
2: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-2.86, max=2.66, mean=0.01, std=0.46, norm=1474.05, grad=False)

(RES) Residual()
├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
└── >>> (CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock.Residual_1.SelfAttention
├── (PAR)
│ └── Identity() (x3)
├── (DISTR)
│ └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16) (x3)
├── ScaledDotProductAttention(num_heads=8)
└── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16)
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-2.91, max=2.64, mean=0.01, std=0.54, norm=1737.50, grad=False)

(CHAIN) CrossAttentionBlock(embedding_dim=320, context_embedding_dim=768, context_key=clip_text_embedding, num_heads=8, use_bias=False)
├── >>> (RES) Residual() | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock.Residual_1 #1
│ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
│ └── (CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False)
│ ├── (PAR) ...
│ ├── (DISTR) ...
│ ├── ScaledDotProductAttention(num_heads=8)
│ └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16)
├── (RES) Residual() #2
│ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
│ ├── (PAR)
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-1.66, max=1.97, mean=0.01, std=0.24, norm=780.96, grad=False)

(CHAIN)
└── >>> (CHAIN) CrossAttentionBlock(embedding_dim=320, context_embedding_dim=768, context_key=clip_text_embedding, num_heads=8, use_bias=False) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock
├── (RES) Residual() #1
│ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
│ └── (CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False) ...
├── (RES) Residual() #2
│ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
│ ├── (PAR) ...
│ └── (CHAIN) Attention(embedding_dim=320, num_heads=8, key_embedding_dim=768, value_embedding_dim=768, inner_dim=320, use_bias=False) ...
└── (RES) Residual() #3
├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16)
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-1.66, max=1.97, mean=0.01, std=0.24, norm=780.96, grad=False)

(RES) CLIPLCrossAttention(channels=320)
├── (CHAIN) #1
│ ├── GroupNorm(num_groups=32, eps=1e-06, channels=320, device=cuda:0, dtype=bfloat16)
│ ├── Conv2d(in_channels=320, out_channels=320, kernel_size=(1, 1), device=cuda:0, dtype=bfloat16)
│ ├── (CHAIN) StatefulFlatten(start_dim=2)
│ │ ├── SetContext(context=flatten, key=sizes)
│ │ └── Flatten(start_dim=2)
│ └── Transpose(dim0=1, dim1=2)
├── >>> (CHAIN) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2 #2
│ └── (CHAIN) CrossAttentionBlock(embedding_dim=320, context_embedding_dim=768, context_key=clip_text_embedding, num_heads=8, use_bias=False)
│ ├── (RES) Residual() #1 ...
0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-1.66, max=1.97, mean=0.01, std=0.24, norm=780.96, grad=False)

(CHAIN)
├── (SUM) ResidualBlock(in_channels=320, out_channels=320)
│ ├── (CHAIN)
│ │ ├── GroupNorm(num_groups=32, channels=320, device=cuda:0, dtype=bfloat16) #1
│ │ ├── SiLU() #1
│ │ ├── (SUM) RangeAdapter2d(channels=320, embedding_dim=1280) ...
│ │ ├── GroupNorm(num_groups=32, channels=320, device=cuda:0, dtype=bfloat16) #2
│ │ ├── SiLU() #2
│ │ └── Conv2d(in_channels=320, out_channels=320, kernel_size=(3, 3), padding=(1, 1), device=cuda:0, dtype=bfloat16)
│ └── Identity()
├── >>> (RES) CLIPLCrossAttention(channels=320) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention
0: Tensor(shape=(2, 320, 144, 112), dtype=bfloat16, device=cuda:0, min=-8.19, max=6.47, mean=-0.09, std=0.69, norm=2236.34, grad=False)

(CHAIN) DownBlocks(in_channels=4)
├── (CHAIN) #1
│ ├── Conv2d(in_channels=4, out_channels=320, kernel_size=(3, 3), padding=(1, 1), device=cuda:0, dtype=bfloat16)
│ ├── (RES) Residual()
│ │ ├── UseContext(context=controlnet, key=condition_tile)
│ │ └── (CHAIN) ConditionEncoder() ...
│ └── (PASS)
│ ├── Conv2d(in_channels=320, out_channels=320, kernel_size=(1, 1), device=cuda:0, dtype=bfloat16)
│ └── Lambda(_store_residual(x: torch.Tensor))
├── >>> (CHAIN) (x2) | SD1UNet.Controlnet.DownBlocks.Chain_2 #2
│ ├── (SUM) ResidualBlock(in_channels=320, out_channels=320)
0: Tensor(shape=(2, 320, 144, 112), dtype=bfloat16, device=cuda:0, min=-6.19, max=6.31, mean=-0.04, std=0.67, norm=2141.49, grad=False)

(PASS) Controlnet(name=tile, scale=0.6)
├── (PASS) TimestepEncoder()
│ ├── UseContext(context=diffusion, key=timestep)
│ ├── (CHAIN) RangeEncoder(sinusoidal_embedding_dim=320, embedding_dim=1280)
│ │ ├── Lambda(compute_sinusoidal_embedding(x: jaxtyping.Int[Tensor, '*batch 1']) -> jaxtyping.Float[Tensor, '*batch 1 embedding_dim'])
│ │ ├── Converter(set_device=False)
│ │ ├── Linear(in_features=320, out_features=1280, device=cuda:0, dtype=bfloat16) #1
│ │ ├── SiLU()
│ │ └── Linear(in_features=1280, out_features=1280, device=cuda:0, dtype=bfloat16) #2
│ └── SetContext(context=range_adapter, key=timestep_embedding_tile)
├── Slicing(dim=1, end=4)
0: Tensor(shape=(2, 4, 144, 112), dtype=bfloat16, device=cuda:0, min=-3.31, max=4.03, mean=0.48, std=1.11, norm=433.98, grad=False)

(CHAIN) SD1UNet(in_channels=4)
├── >>> (PASS) Controlnet(name=tile, scale=0.6) | SD1UNet.Controlnet
│ ├── (PASS) TimestepEncoder()
│ │ ├── UseContext(context=diffusion, key=timestep)
│ │ ├── (CHAIN) RangeEncoder(sinusoidal_embedding_dim=320, embedding_dim=1280) ...
│ │ └── SetContext(context=range_adapter, key=timestep_embedding_tile)
│ ├── Slicing(dim=1, end=4)
│ ├── (CHAIN) DownBlocks(in_channels=4)
│ │ ├── (CHAIN) #1 ...
│ │ ├── (CHAIN) (x2) #2 ...
│ │ ├── (CHAIN) #3 ...
0: Tensor(shape=(2, 4, 144, 112), dtype=bfloat16, device=cuda:0, min=-3.31, max=4.03, mean=0.48, std=1.11, norm=433.98, grad=False)`

@ai-anchorite
Copy link
Collaborator

That doesn't look fun.

The fatality is:
CUDA out of memory. Tried to allocate 7.75 GiB. GPU

Are you using an older or workstation GPU?

@djkakadu
Copy link
Author

djkakadu commented Nov 25, 2024 via email

@ai-anchorite
Copy link
Collaborator

ai-anchorite commented Nov 25, 2024

hmm. did you disable SYSMEM fallback at some point? otherwise make sure your nvidia driver is up to date. It should happily fall into shared memory and just slow down if you exceed 8GB VRAM

@ai-anchorite
Copy link
Collaborator

oh actually, i think bfloat16 was only supported from Ampere Nvidia onwards. That's worth checking into.

It was changed from float16 to bfloat16 for (phat) Mac to work better, but not helpful it it breaks it for 1000 and 2000 grn Nvidia!

it was changed in two places in this commit: 80a3efd

line 111:

dtype = devicetorch.dtype(torch, "bfloat16")

and 126:

torch_dtype = devicetorch.dtype(torch, "bfloat16")

change: "bfloat16" to "float16"

@djkakadu
Copy link
Author

These work and my problem solved.

change: "bfloat16" to "float16"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants