You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Second, we are not allowed to have runtime dispatches. All function calls
109
+
need to be determined at compile time. Here it is important to note that runtime dispatches
110
+
can also be introduced by functions which are not fully specialized. Let us take this example:
111
+
112
+
```julia-repl
113
+
julia> function my_inner_kernel!(f, t) # does not specialize
114
+
t .= f.(t)
115
+
end
116
+
my_inner_kernel! (generic function with 1 method)
117
+
118
+
julia> function my_outer_kernel(f, a)
119
+
i = threadIdx().x
120
+
my_inner_kernel!(f, @view a[i, :])
121
+
return nothing
122
+
end
123
+
my_outer_kernel (generic function with 1 method)
124
+
125
+
julia> a = CUDA.rand(Int, (2,2))
126
+
2×2 CuArray{Int64, 2, CUDA.DeviceMemory}:
127
+
5153094658246882343 -1636555237989902283
128
+
2088126782868946458 -5701665962120018867
129
+
130
+
julia> id(x) = x
131
+
id (generic function with 1 method)
132
+
133
+
julia> @cuda threads=size(a, 1) my_outer_kernel(id, a)
134
+
ERROR: InvalidIRError: compiling MethodInstance for my_outer_kernel(::typeof(id), ::CuDeviceMatrix{Int64, 1}) resulted in invalid LLVM IR
135
+
Reason: unsupported dynamic function invocation (call to my_inner_kernel!(f, t) @ Main REPL[27]:1)
136
+
```
137
+
138
+
Here the function `my_inner_kernel!` is not specialized. We can force specialization
139
+
in this case as follows:
140
+
141
+
```julia-repl
142
+
julia> function my_inner_kernel2!(f::F, t::T) where {F,T} # forces specialization
143
+
t .= f.(t)
144
+
end
145
+
my_inner_kernel2! (generic function with 1 method)
146
+
147
+
julia> function my_outer_kernel2(f, a)
148
+
i = threadIdx().x
149
+
my_inner_kernel2!(f, @view a[i, :])
150
+
return nothing
151
+
end
152
+
my_outer_kernel2 (generic function with 1 method)
153
+
154
+
julia> a = CUDA.rand(Int, (2,2))
155
+
2×2 CuArray{Int64, 2, CUDA.DeviceMemory}:
156
+
3193805011610800677 4871385510397812058
157
+
-9060544314843886881 8829083170181145736
158
+
159
+
julia> id(x) = x
160
+
id (generic function with 1 method)
161
+
162
+
julia> @cuda threads=size(a, 1) my_outer_kernel2(id, a)
163
+
CUDA.HostKernel for my_outer_kernel2(typeof(id), CuDeviceMatrix{Int64, 1})
164
+
```
165
+
166
+
More cases and details on specialization can be found in [the Julia manual](https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing).
167
+
100
168
## Synchronization
101
169
102
170
To synchronize threads in a block, use the `sync_threads()` function. More advanced variants
0 commit comments