flash attention: correct head dim check and L_k padding #736

Green-Sky · 2025-07-21T16:38:49Z

Gains are minimal, if any.
SD1 now has some usage of flash attention.

Some numbers where found by @bssrdf in #386 .

From what I can tell, the numbers should translate to vulkan and probably rocm pretty well.

sd1

model	backend	resolution	before sampling time	after sampling time	before comp buff	after comp buff
CyberRealistic_V9_FP16, 30steps, 5cfg	cuda	512x768	20.37s	19.57s	1220.07 MB	1218.29 MB
CyberRealistic_V9-q8_0, 30steps, 5cfg	cuda	512x768	20.99s	20.01s	1220.07 MB	1218.29 MB
CyberRealistic_V9_FP16, 30steps, 5cfg	cuda	768x1024	69.43s	66.23s	4736.59 MB	4734.80 MB
CyberRealistic_V9-q8_0, 30steps, 5cfg	cuda	768x1024	69.66s	66.77s	4736.59 MB	4734.80 MB

fattn log

[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:96 L_k:96 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:96 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:384 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:384 L_k:77 n_head:8 C:1280 d_head:160 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:1536 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:1536 L_k:77 n_head:8 C:640 d_head:80 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:6144 n_head:8 C:320 d_head:40 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:6144 L_k:77 n_head:8 C:320 d_head:40 N:1

SDXL

Affected from padding L_k.

model	backend	resolution	before sampling time	after sampling time	before comp buff	after comp buff
RealVisXL_V3.0_Turbo, q8_0, 8steps, 3cfg	cuda	768x768	7.67s	7.52s	280.89 MB	277.01 MB
RealVisXL_V3.0_Turbo, q8_0, 8steps, 3cfg	cuda	1024x1024	11.07s	10.81s	440.86 MB	491.99 MB
RealVisXL_V3.0_Turbo, q8_0, 8steps, 3cfg	vulkan	768x768	10.49s	10.68s	280.89 MB	277.01 MB
RealVisXL_V3.0_Turbo, q8_0, 8steps, 3cfg	vulkan	1024x1024	18.66s	18.77s	440.86 MB	491.99 MB

fattn log 768x768

[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:576 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:576 L_k:77 n_head:20 C:1280 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:2304 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:845  - attention_ext L_q:2304 L_k:77 n_head:10 C:640 d_head:64 N:1
[DEBUG] ggml_extend.hpp:883  -  uses flash attention
[DEBUG] ggml_extend.hpp:885  -  padding k and v dim1 by 179

SD2 and flux stay the same, besides some wonky resolutions.

Device is rtx2070 8gig mobile.
Vulkan with coopmat1 only.

gains are minimal, if any. sd1 now has some useage of flash attention

leejet · 2025-07-23T16:57:00Z

Thank you for your contribution.

Green-Sky added 2 commits July 21, 2025 18:36

flash attention: correct head dim check and L_k padding

4ba622b

gains are minimal, if any. sd1 now has some useage of flash attention

disable debug logging again

0dd58bf

Green-Sky marked this pull request as ready for review July 21, 2025 22:10

Green-Sky mentioned this pull request Jul 21, 2025

rescuing flash attention #386

Merged

8 tasks

leejet merged commit ab835f7 into leejet:master Jul 23, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flash attention: correct head dim check and L_k padding #736

flash attention: correct head dim check and L_k padding #736

Uh oh!

Green-Sky commented Jul 21, 2025 •

edited

Loading

Uh oh!

leejet commented Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

flash attention: correct head dim check and L_k padding #736

flash attention: correct head dim check and L_k padding #736

Uh oh!

Conversation

Green-Sky commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

sd1

SDXL

Uh oh!

leejet commented Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

Green-Sky commented Jul 21, 2025 •

edited

Loading