Skip to content

Commit db4f7a3

Browse files
Nicoshevfacebook-github-bot
authored andcommitted
Tweak KleidiAI's FP16 matmul algorithm (pytorch#4416)
Summary: Pull Request resolved: pytorch#4416 X-link: facebookresearch/FBGEMM#1488 Hoist memory loads from the outer loop Intention is to prevent these loads from displacing cache lines, as they may contain matrix data. Similarly, the loads are likely to inccur in cache misses after the first iteration. Executing the inner loop will probably fill the cache with matrix data. Benchmarks repeatedly show a throughput improvement of around 1%. before: P1854747253 after: P1854747141 Reviewed By: YifanYuan3 Differential Revision: D77459967 fbshipit-source-id: 01eb4fc004ba055823551d843f7bf7728caa74a8
1 parent 3f2cd6e commit db4f7a3

File tree

1 file changed

+229
-198
lines changed

1 file changed

+229
-198
lines changed

0 commit comments

Comments
 (0)