Skip to content

Vulkan: Tuning warptile for Mali GPU Performance #13483

Answered by 0cc4m
rmatif asked this question in Q&A
Discussion options

You must be logged in to vote

You need to look into the meaning of the warptile parameters, they are not independent. I'll try to summarize what I remember:

The 11 parameters are: BLOCK_SIZE, BM, BN, BK, WM, WN, WMITER, TM, TN, TK and WARP.
They originate from this CUDA article, look at the kernel 10 information: https://siboehm.com/articles/22/CUDA-MMM
Especially the diagram is helpful.

For your problem: You need to make sure that the amount of warps in the subgroup (BLOCK_SIZE) is identical to the amount of warptiles. For example in the Nvidia case (warps of size 32) we have a subgroup of size BLOCK_SIZE=128, meaning 4 warps. BM=64, BN=64 and WM=32, WN=32 means we have 4 tiles. This is why it works.

In your WARP=16 …

Replies: 3 comments 9 replies

Comment options

You must be logged in to vote
0 replies
Comment options

rmatif
May 15, 2025
Collaborator Author

You must be logged in to vote
6 replies
@rmatif
Comment options

rmatif May 15, 2025
Collaborator Author

@jeffbolznv
Comment options

@0cc4m
Comment options

0cc4m May 16, 2025
Collaborator

@rmatif
Comment options

rmatif May 16, 2025
Collaborator Author

@0cc4m
Comment options

0cc4m May 16, 2025
Collaborator

Comment options

rmatif
May 17, 2025
Collaborator Author

You must be logged in to vote
3 replies
@0cc4m
Comment options

0cc4m May 17, 2025
Collaborator

Answer selected by rmatif
@rmatif
Comment options

rmatif May 22, 2025
Collaborator Author

@0cc4m
Comment options

0cc4m May 23, 2025
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants