-
Notifications
You must be signed in to change notification settings - Fork 19
Implement optimized FP16 support for ARM architecture - [MOD-9078] #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #620 +/- ##
==========================================
- Coverage 96.55% 96.51% -0.04%
==========================================
Files 106 106
Lines 5745 5745
==========================================
- Hits 5547 5545 -2
- Misses 198 200 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work,
few comments/questions
This reverts commit 06dd65c.
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 0.8
git worktree add -d .worktree/backport-620-to-0.8 origin/0.8
cd .worktree/backport-620-to-0.8
git switch --create backport-620-to-0.8
git cherry-pick -x fcc8d78b8ceebf249ab6d273761b379580486032 |
) * implement L2 SVE with intermediate casting to f32 * implement IP SVE with f16 ops only * implements L2 sve with no intermediate casting * add SVE and SVE2 functions files * add new files to cmake and use new implementations * added benchmarks * fix and switch implementation (due to sve2-only op) * test with SVE2 intrinsics * Revert "test with SVE2 intrinsics" This reverts commit 06dd65c. * remove redundant implementation * move to 4 steps per iteration implementations * add macro cleanup * fix implementation * refactor to use 4 accumulators * added tests * refactor accumulation * add initial neon implementation * fix build flags and file layout * fix tests * cleanup and L2 implementation with neon+fp16 * format * fix test for any arch * another attempt * fix test * rename step functions * comment-in neon benchmarks * fix benchmark * review fixes * more review fixes * fixes and cleanup * fix svwhilelt_b16 calls * use vbslq_f16 * typo fix * fix test for OSs that don't support fp16 * added back guards for a specific x86 test (cherry picked from commit fcc8d78)
Successfully created backport PR for |
…78] (#643) Implement optimized FP16 support for ARM architecture - [MOD-9078] (#620) * implement L2 SVE with intermediate casting to f32 * implement IP SVE with f16 ops only * implements L2 sve with no intermediate casting * add SVE and SVE2 functions files * add new files to cmake and use new implementations * added benchmarks * fix and switch implementation (due to sve2-only op) * test with SVE2 intrinsics * Revert "test with SVE2 intrinsics" This reverts commit 06dd65c. * remove redundant implementation * move to 4 steps per iteration implementations * add macro cleanup * fix implementation * refactor to use 4 accumulators * added tests * refactor accumulation * add initial neon implementation * fix build flags and file layout * fix tests * cleanup and L2 implementation with neon+fp16 * format * fix test for any arch * another attempt * fix test * rename step functions * comment-in neon benchmarks * fix benchmark * review fixes * more review fixes * fixes and cleanup * fix svwhilelt_b16 calls * use vbslq_f16 * typo fix * fix test for OSs that don't support fp16 * added back guards for a specific x86 test (cherry picked from commit fcc8d78) Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>
/backport |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 0.8
git worktree add -d .worktree/backport-620-to-0.8 origin/0.8
cd .worktree/backport-620-to-0.8
git switch --create backport-620-to-0.8
git cherry-pick -x fcc8d78b8ceebf249ab6d273761b379580486032 |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 8.0
git worktree add -d .worktree/backport-620-to-8.0 origin/8.0
cd .worktree/backport-620-to-8.0
git switch --create backport-620-to-8.0
git cherry-pick -x fcc8d78b8ceebf249ab6d273761b379580486032 |
) * implement L2 SVE with intermediate casting to f32 * implement IP SVE with f16 ops only * implements L2 sve with no intermediate casting * add SVE and SVE2 functions files * add new files to cmake and use new implementations * added benchmarks * fix and switch implementation (due to sve2-only op) * test with SVE2 intrinsics * Revert "test with SVE2 intrinsics" This reverts commit 06dd65c. * remove redundant implementation * move to 4 steps per iteration implementations * add macro cleanup * fix implementation * refactor to use 4 accumulators * added tests * refactor accumulation * add initial neon implementation * fix build flags and file layout * fix tests * cleanup and L2 implementation with neon+fp16 * format * fix test for any arch * another attempt * fix test * rename step functions * comment-in neon benchmarks * fix benchmark * review fixes * more review fixes * fixes and cleanup * fix svwhilelt_b16 calls * use vbslq_f16 * typo fix * fix test for OSs that don't support fp16 * added back guards for a specific x86 test (cherry picked from commit fcc8d78)
…78] (#644) * Implement optimized FP16 support for ARM architecture - [MOD-9078] (#620) * implement L2 SVE with intermediate casting to f32 * implement IP SVE with f16 ops only * implements L2 sve with no intermediate casting * add SVE and SVE2 functions files * add new files to cmake and use new implementations * added benchmarks * fix and switch implementation (due to sve2-only op) * test with SVE2 intrinsics * Revert "test with SVE2 intrinsics" This reverts commit 06dd65c. * remove redundant implementation * move to 4 steps per iteration implementations * add macro cleanup * fix implementation * refactor to use 4 accumulators * added tests * refactor accumulation * add initial neon implementation * fix build flags and file layout * fix tests * cleanup and L2 implementation with neon+fp16 * format * fix test for any arch * another attempt * fix test * rename step functions * comment-in neon benchmarks * fix benchmark * review fixes * more review fixes * fixes and cleanup * fix svwhilelt_b16 calls * use vbslq_f16 * typo fix * fix test for OSs that don't support fp16 * added back guards for a specific x86 test (cherry picked from commit fcc8d78) * fit benchmark macros for 0.8
Describe the changes in the pull request
Implement SVE and NEON (with fp16 fml) ARM optimizations for FLOAT16
Mark if applicable