Skip to content

Commit 0924175

Browse files
authored
Use while loop instead of for-range loop in broadcast kernel (#448)
This reduces the overhead incurred by the broadcast kernel over a handwritten kernel by 10-15% on M1 with Metal.jl.
1 parent 9de2975 commit 0924175

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

src/host/broadcast.jl

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,9 @@ end
5252

5353
# grid-stride kernel
5454
function broadcast_kernel(ctx, dest, bc′, nelem)
55-
for i in 1:nelem
55+
i = 0
56+
while i < nelem
57+
i += 1
5658
I = @cartesianidx(dest, i)
5759
@inbounds dest[I] = bc′[I]
5860
end

0 commit comments

Comments
 (0)