Implement optimized FP16 support for ARM architecture - [MOD-9078] #620

GuyAv46 · 2025-03-27T16:23:56Z

Describe the changes in the pull request

Implement SVE and NEON (with fp16 fml) ARM optimizations for FLOAT16

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

src/VecSim/spaces/IP/IP_NEON_FP16.h

src/VecSim/spaces/L2/L2_NEON_FP16.h

src/VecSim/spaces/functions/NEON_HP.cpp

tests/benchmark/spaces_benchmarks/bm_spaces_fp16.cpp

codecov · 2025-03-30T12:46:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.51%. Comparing base (bb41732) to head (cb63b7b).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #620      +/-   ##
==========================================
- Coverage   96.55%   96.51%   -0.04%     
==========================================
  Files         106      106              
  Lines        5745     5745              
==========================================
- Hits         5547     5545       -2     
- Misses        198      200       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lerman25

Awesome work,
few comments/questions

src/VecSim/spaces/L2_space.cpp

src/VecSim/spaces/IP/IP_NEON_FP16.h

tests/unit/test_spaces.cpp

The base branch was changed.

This reverts commit 06dd65c.

github-actions · 2025-04-07T08:42:10Z

Backport failed for 0.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 0.8
git worktree add -d .worktree/backport-620-to-0.8 origin/0.8
cd .worktree/backport-620-to-0.8
git switch --create backport-620-to-0.8
git cherry-pick -x fcc8d78b8ceebf249ab6d273761b379580486032

) * implement L2 SVE with intermediate casting to f32 * implement IP SVE with f16 ops only * implements L2 sve with no intermediate casting * add SVE and SVE2 functions files * add new files to cmake and use new implementations * added benchmarks * fix and switch implementation (due to sve2-only op) * test with SVE2 intrinsics * Revert "test with SVE2 intrinsics" This reverts commit 06dd65c. * remove redundant implementation * move to 4 steps per iteration implementations * add macro cleanup * fix implementation * refactor to use 4 accumulators * added tests * refactor accumulation * add initial neon implementation * fix build flags and file layout * fix tests * cleanup and L2 implementation with neon+fp16 * format * fix test for any arch * another attempt * fix test * rename step functions * comment-in neon benchmarks * fix benchmark * review fixes * more review fixes * fixes and cleanup * fix svwhilelt_b16 calls * use vbslq_f16 * typo fix * fix test for OSs that don't support fp16 * added back guards for a specific x86 test (cherry picked from commit fcc8d78)

github-actions · 2025-04-07T08:42:13Z

Successfully created backport PR for 8.0:

[8.0] Implement optimized FP16 support for ARM architecture - [MOD-9078] #643

…78] (#643) Implement optimized FP16 support for ARM architecture - [MOD-9078] (#620) * implement L2 SVE with intermediate casting to f32 * implement IP SVE with f16 ops only * implements L2 sve with no intermediate casting * add SVE and SVE2 functions files * add new files to cmake and use new implementations * added benchmarks * fix and switch implementation (due to sve2-only op) * test with SVE2 intrinsics * Revert "test with SVE2 intrinsics" This reverts commit 06dd65c. * remove redundant implementation * move to 4 steps per iteration implementations * add macro cleanup * fix implementation * refactor to use 4 accumulators * added tests * refactor accumulation * add initial neon implementation * fix build flags and file layout * fix tests * cleanup and L2 implementation with neon+fp16 * format * fix test for any arch * another attempt * fix test * rename step functions * comment-in neon benchmarks * fix benchmark * review fixes * more review fixes * fixes and cleanup * fix svwhilelt_b16 calls * use vbslq_f16 * typo fix * fix test for OSs that don't support fp16 * added back guards for a specific x86 test (cherry picked from commit fcc8d78) Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>

GuyAv46 · 2025-04-07T10:54:59Z

/backport

github-actions · 2025-04-07T10:55:17Z

Backport failed for 0.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 0.8
git worktree add -d .worktree/backport-620-to-0.8 origin/0.8
cd .worktree/backport-620-to-0.8
git switch --create backport-620-to-0.8
git cherry-pick -x fcc8d78b8ceebf249ab6d273761b379580486032

github-actions · 2025-04-07T10:55:18Z

Backport failed for 8.0, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.0
git worktree add -d .worktree/backport-620-to-8.0 origin/8.0
cd .worktree/backport-620-to-8.0
git switch --create backport-620-to-8.0
git cherry-pick -x fcc8d78b8ceebf249ab6d273761b379580486032

) * implement L2 SVE with intermediate casting to f32 * implement IP SVE with f16 ops only * implements L2 sve with no intermediate casting * add SVE and SVE2 functions files * add new files to cmake and use new implementations * added benchmarks * fix and switch implementation (due to sve2-only op) * test with SVE2 intrinsics * Revert "test with SVE2 intrinsics" This reverts commit 06dd65c. * remove redundant implementation * move to 4 steps per iteration implementations * add macro cleanup * fix implementation * refactor to use 4 accumulators * added tests * refactor accumulation * add initial neon implementation * fix build flags and file layout * fix tests * cleanup and L2 implementation with neon+fp16 * format * fix test for any arch * another attempt * fix test * rename step functions * comment-in neon benchmarks * fix benchmark * review fixes * more review fixes * fixes and cleanup * fix svwhilelt_b16 calls * use vbslq_f16 * typo fix * fix test for OSs that don't support fp16 * added back guards for a specific x86 test (cherry picked from commit fcc8d78)

…78] (#644) * Implement optimized FP16 support for ARM architecture - [MOD-9078] (#620) * implement L2 SVE with intermediate casting to f32 * implement IP SVE with f16 ops only * implements L2 sve with no intermediate casting * add SVE and SVE2 functions files * add new files to cmake and use new implementations * added benchmarks * fix and switch implementation (due to sve2-only op) * test with SVE2 intrinsics * Revert "test with SVE2 intrinsics" This reverts commit 06dd65c. * remove redundant implementation * move to 4 steps per iteration implementations * add macro cleanup * fix implementation * refactor to use 4 accumulators * added tests * refactor accumulation * add initial neon implementation * fix build flags and file layout * fix tests * cleanup and L2 implementation with neon+fp16 * format * fix test for any arch * another attempt * fix test * rename step functions * comment-in neon benchmarks * fix benchmark * review fixes * more review fixes * fixes and cleanup * fix svwhilelt_b16 calls * use vbslq_f16 * typo fix * fix test for OSs that don't support fp16 * added back guards for a specific x86 test (cherry picked from commit fcc8d78) * fit benchmark macros for 0.8

GuyAv46 requested a review from dor-forer March 27, 2025 16:31

GuyAv46 marked this pull request as ready for review March 27, 2025 16:31

dor-forer reviewed Mar 30, 2025

View reviewed changes

GuyAv46 requested a review from dor-forer March 30, 2025 13:49

dor-forer previously approved these changes Mar 30, 2025

View reviewed changes

alonre24 requested a review from lerman25 March 31, 2025 08:05

lerman25 reviewed Apr 1, 2025

View reviewed changes

src/VecSim/spaces/L2_space.cpp Outdated Show resolved Hide resolved

src/VecSim/spaces/IP/IP_NEON_FP16.h Outdated Show resolved Hide resolved

tests/unit/test_spaces.cpp Show resolved Hide resolved

GuyAv46 added backport 0.6 backport 0.7 backport 0.8 backport 8.0 labels Apr 2, 2025

Base automatically changed from dorer-add-arm-opt-fp32 to main April 2, 2025 15:09

GuyAv46 added 16 commits April 3, 2025 09:50

implement L2 SVE with intermediate casting to f32

afb8e9e

implement IP SVE with f16 ops only

fc3538a

implements L2 sve with no intermediate casting

c23cecb

add SVE and SVE2 functions files

1f8b40a

add new files to cmake and use new implementations

cfc78db

added benchmarks

af9759d

fix and switch implementation (due to sve2-only op)

0322729

test with SVE2 intrinsics

145314d

Revert "test with SVE2 intrinsics"

80e0f8a

This reverts commit 06dd65c.

remove redundant implementation

4c7838b

move to 4 steps per iteration implementations

6934861

add macro cleanup

54122ef

fix implementation

e35e3df

refactor to use 4 accumulators

f145d11

added tests

607f9cf

refactor accumulation

363962c

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 6, 2025

fix test for OSs that don't support fp16

ea46a83

GuyAv46 dismissed lerman25’s stale review via ea46a83 April 6, 2025 15:25

GuyAv46 enabled auto-merge April 6, 2025 15:26

added back guards for a specific x86 test

43bd7b5

GuyAv46 requested a review from lerman25 April 6, 2025 15:44

lerman25 previously approved these changes Apr 6, 2025

View reviewed changes

GuyAv46 added this pull request to the merge queue Apr 6, 2025

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Apr 6, 2025

GuyAv46 enabled auto-merge April 7, 2025 06:41

Merge branch 'main' into guyav-arm_fp16_support

cb63b7b

GuyAv46 dismissed lerman25’s stale review via cb63b7b April 7, 2025 06:43

GuyAv46 requested review from lerman25 and dor-forer April 7, 2025 06:44

dor-forer approved these changes Apr 7, 2025

View reviewed changes

GuyAv46 added this pull request to the merge queue Apr 7, 2025

Merged via the queue into main with commit fcc8d78 Apr 7, 2025
23 checks passed

GuyAv46 deleted the guyav-arm_fp16_support branch April 7, 2025 08:41

github-actions bot mentioned this pull request Apr 7, 2025

[8.0] Implement optimized FP16 support for ARM architecture - [MOD-9078] #643

Merged

GuyAv46 mentioned this pull request Apr 7, 2025

[0.8] Implement optimized FP16 support for ARM architecture - [MOD-9078] #644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement optimized FP16 support for ARM architecture - [MOD-9078] #620

Implement optimized FP16 support for ARM architecture - [MOD-9078] #620

Uh oh!

GuyAv46 commented Mar 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 30, 2025 •

edited

Loading

Uh oh!

lerman25 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

GuyAv46 commented Apr 7, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

Uh oh!

Implement optimized FP16 support for ARM architecture - [MOD-9078] #620

Implement optimized FP16 support for ARM architecture - [MOD-9078] #620

Uh oh!

Conversation

GuyAv46 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lerman25 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

GuyAv46 commented Apr 7, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

Uh oh!

GuyAv46 commented Mar 27, 2025 •

edited

Loading

codecov bot commented Mar 30, 2025 •

edited

Loading