Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11682

DariusHolmgren · 2025-06-14T03:55:05Z

Summary:
Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those.

New fixes:

straightforward op_sub build fixes
s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test
define ET_USE_PYTORCH_HEADERS to detect whether exceptions are
enabled, and use #if defined(...) && ... instead of #ifdef to check the macro so
that we don't use PyTorch headers if exceptions are
disabled. (otherwise, we might have problems with e.g. TORCH_CHECK)

Original summary for #11204:
Set of math functions that work on both scalars and at::vec::Vectorized,
to be used in #9432.

Original summary for #11205:
Make sure we test the optimized versions of portable kernels even if
they are shadowed by optimized implementations. Intended to support
#9432.

Original summary for #9432:

This is a first cut at #9241 . In this PR I've vectorized a small
initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul,
pow, and sigmoid. In addition, the following ops should have gotten
vectorized automatically because they already used generic lambdas: add,
div, rsub, sub. I've left covering ops that use the unary_ufunc_*
utilities in
pattern.h
for a follow-up push, because pattern.h and elementwise_util need some
work before we can migrate pattern.h's utilities to be backed by
elementwise_util.

This PR adds an interesting testing problem: in theory, all operators
might need test cases long enough to tickle vectorization, because we
might accidentally vectorize ops unexpectedly and break their lambdas
due to anticipated differences in semantics. I address this issue by
using Vectorized for the scalar prologue/epilogue in debug mode (we run
tests in both debug and release) so that we can detect broken lambdas. I
additionally intentionally introduced a bug in the vectorized path in
elementwise_util and manually verified that we saw test failures for
each vectorized op called out above.

Differential Revision:
D76467389

fix ET_USE_PYTORCH_HEADERS

Summary: To support passing ET_USE_PYTORCH_HEADERS only when exceptions are enabled. Differential Revision: D76470039

…ble_kernels test (pytorch#11205)", and "Add vectorization in elementwise_util (pytorch#9432)" Summary: Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if defined(...) && ...` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for pytorch#11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in pytorch#9432. Original summary for pytorch#11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support pytorch#9432. Original summary for pytorch#9432: This is a first cut at pytorch#9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: D76467389 *** fix ET_USE_PYTORCH_HEADERS

pytorch-bot · 2025-06-14T03:55:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11682

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 735f214 with merge base 56392aa ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
RuntimeError: Command docker exec -t fc1ebce4ad51b3a83161becab4e3867f6d347078fc53ecc35afef84bd4665b52 /exec failed with exit code 127
pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 255

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-06-14T03:55:40Z

This pull request was exported from Phabricator. Differential Revision: D76467389

github-actions · 2025-06-14T03:56:09Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

swolchok and others added 2 commits June 13, 2025 13:17

Define ET_HAS_EXCEPTIONS macro

423a178

Summary: To support passing ET_USE_PYTORCH_HEADERS only when exceptions are enabled. Differential Revision: D76470039

DariusHolmgren requested review from larryliu0820, kirklandsign, JacobSzwejbka, lucylq, swolchok and manuelcandales as code owners June 14, 2025 03:55

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 14, 2025

facebook-github-bot added the fb-exported label Jun 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11682

Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11682

Uh oh!

DariusHolmgren commented Jun 14, 2025

Uh oh!

pytorch-bot bot commented Jun 14, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 14, 2025

Uh oh!

github-actions bot commented Jun 14, 2025

Uh oh!

Uh oh!

Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11682

Are you sure you want to change the base?

Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11682

Uh oh!

Conversation

DariusHolmgren commented Jun 14, 2025

Uh oh!

pytorch-bot bot commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11682

❌ 2 New Failures

Uh oh!

facebook-github-bot commented Jun 14, 2025

Uh oh!

github-actions bot commented Jun 14, 2025

This PR needs a release notes: label

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 14, 2025 •

edited

Loading

This PR needs a `release notes:` label