Closed
Description
We are currently suffering pretty badly from these two issues:
- Broadcasting is much slower than a for loop JuliaLang/julia#28126
- Broadcast and linear indexing JuliaLang/julia#32051
And this impact is notably important on the GPU, as Tim notes here.
Here's a benchmark showing the impact on a local benchmark with a target resolution on a single A100. All the kernel is doing is @. x = zero(x)
:
julia> @pretty_belapsed at_dot_call!($X_array, $Y_array)
6 milliseconds, 847 microseconds
julia> @pretty_belapsed at_dot_call!($X_vector, $Y_vector)
2 milliseconds, 901 microseconds
Since we have styles that basically categorize the type of broadcast expression we have, I think we can transform the broadcasted objects in our pointwise case, and force linear indexing there.
This could be a limitation for when we have views, but we can try it out since SubArrays seem to safely support linear indexing:
julia> a = rand(3,3,3);
julia> b = view(a, :,2,:);
julia> b[9]
0.7040413357481441
julia> b[10]
ERROR: BoundsError: attempt to access 3×3 view(::Array{Float64, 3}, :, 2, :) with eltype Float64 at index [10]
Stacktrace:
[1] throw_boundserror(A::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{…}, Int64, Base.Slice{…}}, false}, I::Tuple{Int64})
@ Base ./abstractarray.jl:737
[2] checkbounds
@ ./abstractarray.jl:702 [inlined]
[3] _getindex
@ ./abstractarray.jl:1335 [inlined]
[4] getindex(A::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{…}, Int64, Base.Slice{…}}, false}, I::Int64)
@ Base ./abstractarray.jl:1291
[5] top-level scope
@ REPL[11]:1
Some type information was truncated. Use `show(err)` to see complete types.
xref: #1403