Skip to content

Commit cda81cf

Browse files
authored
Shift transition to multithreading towards larger matrix sizes
See #1886 and JuliaRobotics issue 500. trsm benchmarks on Haswell and Zen showed that with these values performance is roughly doubled for matrix sizes between 8x8 and 14x14, and still 10 to 20 percent better near the new cutoff at 32x32.
1 parent 256eb58 commit cda81cf

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

interface/trsm.c

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,12 @@
8181
#endif
8282
#endif
8383

84+
#ifndef COMPLEX
85+
#define SMP_FACTOR 8
86+
#else
87+
#define SMP_FACTOR 4
88+
#endif
89+
8490
static int (*trsm[])(blas_arg_t *, BLASLONG *, BLASLONG *, FLOAT *, FLOAT *, BLASLONG) = {
8591
#ifndef TRMM
8692
TRSM_LNUU, TRSM_LNUN, TRSM_LNLU, TRSM_LNLN,
@@ -366,10 +372,10 @@ void CNAME(enum CBLAS_ORDER order,
366372
mode |= (trans << BLAS_TRANSA_SHIFT);
367373
mode |= (side << BLAS_RSIDE_SHIFT);
368374

369-
if ( args.m < 2*GEMM_MULTITHREAD_THRESHOLD )
375+
if ( args.m < SMP_FACTOR * GEMM_MULTITHREAD_THRESHOLD )
370376
args.nthreads = 1;
371377
else
372-
if ( args.n < 2*GEMM_MULTITHREAD_THRESHOLD )
378+
if ( args.n < SMP_FACTOR * GEMM_MULTITHREAD_THRESHOLD )
373379
args.nthreads = 1;
374380
else
375381
args.nthreads = num_cpu_avail(3);

0 commit comments

Comments
 (0)