Skip to content

flash attention: correct head dim check and L_k padding #736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 23, 2025

Conversation

Green-Sky
Copy link
Contributor

@Green-Sky Green-Sky commented Jul 21, 2025

Gains are minimal, if any.
SD1 now has some usage of flash attention.

Some numbers where found by @bssrdf in #386 .

From what I can tell, the numbers should translate to vulkan and probably rocm pretty well.

sd1

model backend resolution before sampling time after sampling time before comp buff after comp buff
CyberRealistic_V9_FP16, 30steps, 5cfg cuda 512x768 20.37s 19.57s 1220.07 MB 1218.29 MB
CyberRealistic_V9-q8_0, 30steps, 5cfg cuda 512x768 20.99s 20.01s 1220.07 MB 1218.29 MB
CyberRealistic_V9_FP16, 30steps, 5cfg cuda 768x1024 69.43s 66.23s 4736.59 MB 4734.80 MB
CyberRealistic_V9-q8_0, 30steps, 5cfg cuda 768x1024 69.66s 66.77s 4736.59 MB 4734.80 MB
fattn log
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:96 L_k:96 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:96 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1

SDXL

Affected from padding L_k.

model backend resolution before sampling time after sampling time before comp buff after comp buff
RealVisXL_V3.0_Turbo, q8_0, 8steps, 3cfg cuda 768x768 7.67s 7.52s 280.89 MB 277.01 MB
RealVisXL_V3.0_Turbo, q8_0, 8steps, 3cfg cuda 1024x1024 11.07s 10.81s 440.86 MB 491.99 MB
RealVisXL_V3.0_Turbo, q8_0, 8steps, 3cfg vulkan 768x768 10.49s 10.68s 280.89 MB 277.01 MB
RealVisXL_V3.0_Turbo, q8_0, 8steps, 3cfg vulkan 1024x1024 18.66s 18.77s 440.86 MB 491.99 MB
fattn log 768x768
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179

SD2 and flux stay the same, besides some wonky resolutions.

Device is rtx2070 8gig mobile.
Vulkan with coopmat1 only.

gains are minimal, if any. sd1 now has some useage of flash attention
@Green-Sky Green-Sky marked this pull request as ready for review July 21, 2025 22:10
@Green-Sky Green-Sky mentioned this pull request Jul 21, 2025
8 tasks
@leejet
Copy link
Owner

leejet commented Jul 23, 2025

Thank you for your contribution.

@leejet leejet merged commit ab835f7 into leejet:master Jul 23, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants