-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
In the example below, computing the sum of an Array{Union{Float64, Missing}}
using a simple loop is faster when inserting a branch which adds zero(T)
to the accumulator when an entry is missing
rather than doing nothing. This is because the latter uses SIMD instructions, but not the former (even with @simd
). Adding -zero(T)
instead of zero(T)
disables SIMD. Is this expected or could the compiler improve?
This means that sum(skipmissing(::Array{Union{Float64, Missing}}))
is slower than what a specialized implementation which would add zero(T)
could do (note that pairwise sum by passing ismissing(x) ? zero(T) : x
has similar performance). But of course adding zero(T)
doesn't give the same result in corner cases (when the vector contains only -0.0
AFAICT; are there other cases?). If that's the only solution, we could check whether there's at least one entry which is not -0.0
, and if so switch to adding zero(T)
?
julia> using BenchmarkTools
julia> function sum_nonmissing(X::AbstractArray)
s = -zero(eltype(X))
@inbounds @simd for x in X
if x !== missing
s += x
end
end
s
end
sum_nonmissing (generic function with 1 method)
julia> function sum_nonmissing2(X::AbstractArray)
s = -zero(eltype(X))
@inbounds @simd for x in X
if x !== missing
s += x
else
s += zero(eltype(X))
end
end
s
end
sum_nonmissing2 (generic function with 1 method)
julia> function sum_nonmissing3(X::AbstractArray)
s = -zero(eltype(X))
@inbounds @simd for x in X
if x !== missing
s += x
else
s += -zero(eltype(X))
end
end
s
end
sum_nonmissing3 (generic function with 1 method)
julia> X1 = rand(10_000_000);
julia> X2 = Vector{Union{Missing, Float64}}(X1);
julia> X3 = ifelse.(rand(length(X2)) .< 0.9, X2, missing);
julia> @btime sum_nonmissing(X3)
17.278 ms (1 allocation: 16 bytes)
4.499755611211945e6
julia> @btime sum_nonmissing2(X3)
6.788 ms (1 allocation: 16 bytes)
4.4997556112116e6
julia> @btime sum_nonmissing3(X3)
16.542 ms (1 allocation: 16 bytes)
4.499755611211945e6