Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11682
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those.
New fixes:
enabled, and use
#if defined(...) && ...
instead of#ifdef
to check the macro sothat we don't use PyTorch headers if exceptions are
disabled. (otherwise, we might have problems with e.g. TORCH_CHECK)
Original summary for #11204:
Set of math functions that work on both scalars and at::vec::Vectorized,
to be used in #9432.
Original summary for #11205:
Make sure we test the optimized versions of portable kernels even if
they are shadowed by optimized implementations. Intended to support
#9432.
Original summary for #9432:
This is a first cut at #9241 . In this PR I've vectorized a small
initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul,
pow, and sigmoid. In addition, the following ops should have gotten
vectorized automatically because they already used generic lambdas: add,
div, rsub, sub. I've left covering ops that use the
unary_ufunc_*
utilities in
pattern.h
for a follow-up push, because pattern.h and elementwise_util need some
work before we can migrate pattern.h's utilities to be backed by
elementwise_util.
This PR adds an interesting testing problem: in theory, all operators
might need test cases long enough to tickle vectorization, because we
might accidentally vectorize ops unexpectedly and break their lambdas
due to anticipated differences in semantics. I address this issue by
using Vectorized for the scalar prologue/epilogue in debug mode (we run
tests in both debug and release) so that we can detect broken lambdas. I
additionally intentionally introduced a bug in the vectorized path in
elementwise_util and manually verified that we saw test failures for
each vectorized op called out above.
Differential Revision:
D76467389
fix ET_USE_PYTORCH_HEADERS