-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Open
Labels
foldsum, maximum, reduce, foldl, etc.sum, maximum, reduce, foldl, etc.performanceMust go fasterMust go faster
Description
I have a custom array MyArray
with Base.IndexStyle(a::MyArray) = IndexCartesian()
. I noted that the builtin sum
function is slower than a naive mysum
implementation.
See this MWE
using BenchmarkTools
struct MyArray{T, A<:AbstractArray{T, 3}} <: AbstractArray{T, 3}
data::A
end
Base.size(a::MyArray) = size(a.data)
Base.@propagate_inbounds @inline function Base.getindex(a::MyArray, i, j, k)
@boundscheck checkbounds(a.data, i, j, k)
@inbounds v = a.data[i, j, k]
return v
end
# change this to IndexLinear to get ~5x speed up
Base.IndexStyle(a::MyArray) = IndexCartesian()
function mysum(a)
S = zero(eltype(a))
for k = 1:size(a, 3)
for j = 1:size(a, 2)
@simd for i = 1:size(a, 1)
@inbounds S += a[i, j, k]
end
end
end
return S
end
N = 64
pa = rand(N, N, N)
a = MyArray(pa)
@btime mysum($a)
@btime sum($a)
@btime sum($pa)
On my machine I get
47.294 μs (0 allocations: 0 bytes)
453.761 μs (0 allocations: 0 bytes)
48.895 μs (0 allocations: 0 bytes)
which makes it 10x
slower. I find surprising that disabling bound checking with the flag --check-bounds=no
makes sum(::MyArray)
twice as fast, i.e.
46.498 μs (0 allocations: 0 bytes)
216.276 μs (0 allocations: 0 bytes)
48.483 μs (0 allocations: 0 bytes)
Metadata
Metadata
Assignees
Labels
foldsum, maximum, reduce, foldl, etc.sum, maximum, reduce, foldl, etc.performanceMust go fasterMust go faster