Skip to content

Commit dc802dd

Browse files
authored
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
2 parents e307675 + 2bb7ea6 commit dc802dd

File tree

4 files changed

+485
-3
lines changed

4 files changed

+485
-3
lines changed

kernel/power/KERNEL.POWER10

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ ZTRMMKERNEL = zgemm_kernel_power10.S
2525
endif
2626

2727
SGEMMKERNEL = sgemm_kernel_power10.c
28-
SGEMMINCOPY = ../generic/gemm_ncopy_16.c
28+
SGEMMINCOPY = sgemm_ncopy_16_power.c
2929
SGEMMITCOPY = sgemm_tcopy_16_power8.S
3030
SGEMMONCOPY = ../generic/gemm_ncopy_8.c
3131
SGEMMOTCOPY = sgemm_tcopy_8_power8.S

kernel/power/KERNEL.POWER8

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ CTRMMKERNEL = ctrmm_kernel_8x4_power8.S
5050
ZTRMMKERNEL = ztrmm_kernel_8x2_power8.S
5151

5252
SGEMMKERNEL = sgemm_kernel_16x8_power8.S
53-
SGEMMINCOPY = ../generic/gemm_ncopy_16.c
53+
SGEMMINCOPY = sgemm_ncopy_16_power.c
5454
SGEMMITCOPY = sgemm_tcopy_16_power8.S
5555
SGEMMONCOPY = ../generic/gemm_ncopy_8.c
5656
SGEMMOTCOPY = sgemm_tcopy_8_power8.S

kernel/power/KERNEL.POWER9

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ CTRMMKERNEL = cgemm_kernel_power9.S
1313
ZTRMMKERNEL = zgemm_kernel_power9.S
1414

1515
SGEMMKERNEL = sgemm_kernel_power9.S
16-
SGEMMINCOPY = ../generic/gemm_ncopy_16.c
16+
SGEMMINCOPY = sgemm_ncopy_16_power.c
1717
SGEMMITCOPY = sgemm_tcopy_16_power8.S
1818
SGEMMONCOPY = ../generic/gemm_ncopy_8.c
1919
SGEMMOTCOPY = sgemm_tcopy_8_power8.S

0 commit comments

Comments
 (0)