Skip to content

Commit a06bcf8

Browse files
authored
Merge pull request OpenMathLib#5353 from nakagawa-fj/feature/gemm_divide_rate_for_A64FX
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX
2 parents 8f0a1a3 + 5253c8f commit a06bcf8

File tree

2 files changed

+6
-0
lines changed

2 files changed

+6
-0
lines changed

driver/level3/gemm.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,10 @@
5959
#define GEMM_Q 128
6060
#endif
6161

62+
#ifdef GEMM_DIVIDE_RATE
63+
#define DIVIDE_RATE GEMM_DIVIDE_RATE
64+
#endif
65+
6266
#ifdef THREADED_LEVEL3
6367
#include "level3_thread.c"
6468
#else

param.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3701,6 +3701,8 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
37013701

37023702
#elif defined(A64FX) // 512-bit SVE
37033703

3704+
#define GEMM_DIVIDE_RATE 1
3705+
37043706
#if defined(XDOUBLE) || defined(DOUBLE)
37053707
#define GEMM_PREFERED_SIZE 8
37063708
#else

0 commit comments

Comments
 (0)