Skip to content

Conversation

@zhangfeiv0
Copy link
Contributor

Description

Implented reorder for fp32 to u8 for matmul.The implementation uses the ppc64 implementation, with the only difference being that the intrinsics in the kernel have been replaced with RVV intrinsics.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

ctest result:

225/229 Test #225: test_benchdnn_modeC_shuffle_ci_cpu ......................   Passed    0.37 sec
        Start 226: test_benchdnn_modeC_softmax_ci_cpu
226/229 Test #226: test_benchdnn_modeC_softmax_ci_cpu ......................   Passed   33.66 sec
        Start 227: test_benchdnn_modeC_sum_ci_cpu
227/229 Test #227: test_benchdnn_modeC_sum_ci_cpu ..........................   Passed    6.23 sec
        Start 228: test_benchdnn_modeC_zeropad_ci_cpu
228/229 Test #228: test_benchdnn_modeC_zeropad_ci_cpu ......................   Passed  1287.15 sec
        Start 229: noexcept-cpp
229/229 Test #229: noexcept-cpp ............................................   Passed    0.08 sec

100% tests passed, 0 tests failed out of 229

Total Test time (real) = 41068.92 sec

Performance improvements

Test platform: BPI-F3

cases before after Speedup
--reorder --sdt=f32 --ddt=u8 --stag=ab --dtag=ab --mode=p 4096x4096 821.03 39.217 21x

@zhangfeiv0 zhangfeiv0 requested a review from a team as a code owner October 24, 2025 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant