-
Notifications
You must be signed in to change notification settings - Fork 19
Implement optimized BF16 support for ARM architecture - [MOD-9079] #623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
36c9295
to
7a768aa
Compare
1de69d4
to
bb46609
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #623 +/- ##
==========================================
- Coverage 97.19% 96.51% -0.68%
==========================================
Files 106 106
Lines 5702 5745 +43
==========================================
+ Hits 5542 5545 +3
- Misses 160 200 +40 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements optimized BFLOAT16 support on ARM by introducing new implementations for both SVE and NEON, along with corresponding test and benchmark updates.
- Added new SVE_BF16 and NEON_BF16 source and header files that provide BF16 inner product and L2 norm functions.
- Updated unit tests and benchmarks to cover the new ARM-specific BF16 implementations.
- Integrated the new implementations into the selection logic in L2_space.cpp and IP_space.cpp based on detected CPU features.
Reviewed Changes
Copilot reviewed 13 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/unit/test_spaces.cpp | Added BF16 tests for NEON_BF16 and SVE_BF16 support |
tests/benchmark/spaces_benchmarks/bm_spaces_bf16.cpp | Added ARM-specific benchmark initialization for BF16 modes |
tests/benchmark/spaces_benchmarks/bm_spaces.h | Updated header includes for ARM BF16 optimizations |
src/VecSim/spaces/functions/SVE_BF16.h & SVE_BF16.cpp | New SVE BF16 implementation functions |
src/VecSim/spaces/functions/NEON_BF16.h & NEON_BF16.cpp | New NEON BF16 implementation functions |
src/VecSim/spaces/L2_space.cpp | Integrated NEON_BF16 and SVE_BF16 optimizations into L2 function |
src/VecSim/spaces/L2/L2_SVE_BF16.h | Added SVE BF16 L2 norm implementation |
src/VecSim/spaces/L2/L2_NEON_BF16.h | Added NEON BF16 L2 norm implementation |
src/VecSim/spaces/IP_space.cpp | Integrated NEON_BF16 and SVE_BF16 optimizations into inner product func |
src/VecSim/spaces/IP/IP_SVE_BF16.h & IP_NEON_BF16.h | New BF16 inner product implementations for SVE and NEON |
Files not reviewed (2)
- cmake/aarch64InstructionFlags.cmake: Language not supported
- src/VecSim/spaces/CMakeLists.txt: Language not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
) * SVE implementation for bf16 * add required build flags and fix implementation * final fixes and implement benchmarks * added tests * implement neon bf16 distance functions * implement build flow and benchmarks * added test * format * remove redundant check * typo fix Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fixes and cleanup * fix build * fix svwhilelt_b16 calls --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit bb41732)
Successfully created backport PR for |
) * SVE implementation for bf16 * add required build flags and fix implementation * final fixes and implement benchmarks * added tests * implement neon bf16 distance functions * implement build flow and benchmarks * added test * format * remove redundant check * typo fix Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fixes and cleanup * fix build * fix svwhilelt_b16 calls --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit bb41732)
Successfully created backport PR for |
…79] (#641) Implement optimized BF16 support for ARM architecture - [MOD-9079] (#623) * SVE implementation for bf16 * add required build flags and fix implementation * final fixes and implement benchmarks * added tests * implement neon bf16 distance functions * implement build flow and benchmarks * added test * format * remove redundant check * typo fix Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fixes and cleanup * fix build * fix svwhilelt_b16 calls --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit bb41732) Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>
…79] (#642) Implement optimized BF16 support for ARM architecture - [MOD-9079] (#623) * SVE implementation for bf16 * add required build flags and fix implementation * final fixes and implement benchmarks * added tests * implement neon bf16 distance functions * implement build flow and benchmarks * added test * format * remove redundant check * typo fix Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fixes and cleanup * fix build * fix svwhilelt_b16 calls --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit bb41732) Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>
…79] (#641) * Implement optimized BF16 support for ARM architecture - [MOD-9079] (#623) * SVE implementation for bf16 * add required build flags and fix implementation * final fixes and implement benchmarks * added tests * implement neon bf16 distance functions * implement build flow and benchmarks * added test * format * remove redundant check * typo fix Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fixes and cleanup * fix build * fix svwhilelt_b16 calls --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit bb41732) * fix benchmark macros for 0.8 --------- Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com> Co-authored-by: GuyAv46 <guy.avimor@gmail.com>
Describe the changes in the pull request
Implement SVE and NEON (with fp16 fml) ARM optimizations for BFLOAT16
Mark if applicable