Skip to content

Commit 067c96a

Browse files
authored
Merge pull request #3177 from martin-frbg/issue3176
Use "old" compute(24) function with clang due to register limitations
2 parents 4b380c0 + 2dfb247 commit 067c96a

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

kernel/x86_64/sgemm_kernel_16x4_skylakex_3.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -501,7 +501,11 @@ CNAME(BLASLONG m, BLASLONG n, BLASLONG k, float alpha, float * __restrict__ A, f
501501
int32_t permil[16] = {0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3};
502502
BLASLONG n_count = n;
503503
float *a_pointer = A,*b_pointer = B,*c_pointer = C,*ctemp = C,*next_b = B;
504+
#if defined(__clang__)
505+
for(;n_count>23;n_count-=24) COMPUTE(24)
506+
#else
504507
for(;n_count>23;n_count-=24) COMPUTE_n24
508+
#endif
505509
for(;n_count>19;n_count-=20) COMPUTE(20)
506510
for(;n_count>15;n_count-=16) COMPUTE(16)
507511
for(;n_count>11;n_count-=12) COMPUTE(12)

0 commit comments

Comments
 (0)