-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Help LLVM better vectorize reduction over skipmissing
#43859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -310,6 +310,15 @@ _mapreduce(f, op, ::IndexCartesian, itr::SkipMissing) = mapfoldl(f, op, itr) | |
mapreduce_impl(f, op, A::SkipMissing, ifirst::Integer, ilast::Integer) = | ||
mapreduce_impl(f, op, A, ifirst, ilast, pairwise_blocksize(f, op)) | ||
|
||
# Some help function to make LLVM better vectorizing these reduction. | ||
# It returns a `noop`, which make sure `op(v, noop) === v` | ||
_fast_noop(::Union{typeof(add_sum),typeof(+)}, ::Type{T}, ::T) where {T<:Union{HWReal,Complex{<:HWReal}}} = zero(T) | ||
_fast_noop(::Union{typeof(mul_prod),typeof(*)}, ::Type{T}, ::T) where {T<:HWReal} = one(T) | ||
# TODO: min/max for IEEEFloat need manually unroll to vectorize. | ||
_fast_noop(::Union{typeof(min),typeof(max)}, ::Type{T}, v::T) where {T<:Integer} = T <: HWReal ? v : nothing | ||
# General fallback | ||
_fast_noop(f, T, v) = nothing | ||
|
||
# Returns nothing when the input contains only missing values, and Some(x) otherwise | ||
@noinline function mapreduce_impl(f, op, itr::SkipMissing{<:AbstractArray}, | ||
ifirst::Integer, ilast::Integer, blksize::Int) | ||
|
@@ -345,10 +354,19 @@ mapreduce_impl(f, op, A::SkipMissing, ifirst::Integer, ilast::Integer) = | |
i == typemax(typeof(i)) && return Some(op(f(a1), f(a2))) | ||
i += 1 | ||
v = op(f(a1), f(a2)) | ||
@simd for i = i:ilast | ||
@inbounds ai = A[i] | ||
if ai !== missing | ||
v = op(v, f(ai)) | ||
# We need to make sure `fskip` is stable. | ||
noop = _fast_noop(op, _return_type(f, Tuple{nonmissingtype(eltype(A))}), v) | ||
if isnothing(noop) | ||
Comment on lines
+358
to
+359
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do both branches provide bit-identical results? Otherwise, relying on the inference result like this is not an optimization and it introduces unpredictable behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For bit-identity, I test it locally and there seems no problem. (Edit: I mean a single operation, not the entire reduction, the result of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The following seems also work for the original purpose: noop = _fast_noop(op, typeof(v), v) #
@simd for i = i:ilast
@inbounds ai = A[i]
v = op(v, ismissing(ai) ? oftype(v, noop) : f(ai))
end But for inputs like |
||
@simd for i = i:ilast | ||
@inbounds ai = A[i] | ||
if !ismissing(ai) | ||
v = op(v, f(ai)) | ||
end | ||
end | ||
else | ||
@inline fskip(x) = ismissing(x) ? noop : f(x) | ||
@inbounds @simd for i = i:ilast | ||
v = op(v, fskip(A[i])) | ||
end | ||
end | ||
return Some(v) | ||
|
@@ -463,4 +481,3 @@ macro coalesce(args...) | |
end | ||
return esc(:(let val; $expr; end)) | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On master vectorizer shows the following to
sum(skipmissing(Vector{Union{Missing,Float64}}(randn(4096))))
:And the following for
sum(skipmissing(Vector{Union{Missing,Int}}(rand(-1:1,4096))))
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird that
LV: Not vectorizing: Found an unidentified PHI %value_phi30161 = phi double [ %117, %L137.lr.ph ], [ %value_phi33, %L137 ]
is incomplete there is only one:Found an unidentified PHI
message and that is longer.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the following:
g(randn(4096))
is vectorlized by LLVM whilef(randn(4096))
not. I tried rewritingf(a)
in c, the output of Clang shows some simd IR (Although the simd width is only 2). Maybe related with #31862?