-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Description
It's me again. That guy who keeps finding bugs when using sum
. In Polyester, in KernelAbstractions, and now in OffsetArrays. Sigh...
Here is a case which slows down a loop by 25x(!) when using a sum
over an OffsetArray. Expanding the sum
into a for
loop restores the normal performance when using Array
.
## Set up
# Cartesian step
@inline δ(i, I::CartesianIndex{m}) where{m} = CartesianIndex(ntuple(j -> j==i ? 1 : 0, m))
# Finite difference
@inline ∂(a,I::CartesianIndex{m},u::AbstractArray{T,n}) where {T,n,m} = @inbounds u[I+δ(a,I),a]-u[I,a]
# Divergence using sum function
@fastmath @inline div_sum(I::CartesianIndex{m},u) where {m} = sum(@inbounds ∂(i,I,u) for i ∈ 1:m)
# Divergence expanded using for loop
@fastmath @inline function div_expanded(I::CartesianIndex{m},u) where {m}
init=zero(eltype(u))
for i in 1:m
init += @inbounds ∂(i,I,u)
end
return init
end
# loop over the range R
loop!(div,u,f,R) = @inbounds @simd for I ∈ R
div[I] = f(I,u)
end
## Test case
# Benchmark with Arrays
N = (2^10+2,2^10+2) # size of the array including halo
R = CartesianIndices(map(n->n-2,N)).+CartesianIndex(map(n->1,N)) # range inside the halo
div = zeros(Float32,N);
u = rand(Float32,N...,length(N));
using BenchmarkTools
@btime loop!($div,$u,$div_sum,$R) # 188.200 μs (0 allocations: 0 bytes)
@btime loop!($div,$u,$div_expanded,$R) # 193.400 μs (0 allocations: 0 bytes)
# Benchmark with OA
using OffsetArrays: Origin
divOA = Origin(0)(div);
uOA = Origin((zeros(Int,length(N))...,1))(u);
ROA = CartesianIndices(map(n->n-2,N))
@btime loop!($divOA,$uOA,$div_expanded,$ROA) # 200.000 μs (0 allocations: 0 bytes)
@btime loop!($divOA,$uOA,$div_sum,$ROA) # 4.960 ms (0 allocations: 0 bytes) WTH!?!
Metadata
Metadata
Assignees
Labels
No labels