@@ -214,7 +214,11 @@ interactions in detail, Nsight Compute is the tool for you. It is again possible
214
214
profiler with an interactive session of Julia, and debug or profile only those sections of
215
215
your application that are marked with ` CUDA.@profile ` .
216
216
217
- Start with launching Julia under the Nsight Compute CLI tool:
217
+ First, ensure that all (CUDA) packages that are involved in your application have been
218
+ precompiled. Otherwise, you'll end up profiling the precompilation process, instead of
219
+ the process where the actual work happens.
220
+
221
+ Then, launch Julia under the Nsight Compute CLI tool as follows:
218
222
219
223
```
220
224
$ ncu --mode=launch julia
@@ -224,23 +228,25 @@ You will get an interactive REPL, where you can execute whatever code you want:
224
228
225
229
``` julia
226
230
julia> using CUDA
227
-
228
- julia> CUDA. driver_version ()
229
-
230
231
# Julia hangs!
231
232
```
232
233
233
234
As soon as you use CUDA.jl, your Julia process will hang. This is expected, as the tool
234
235
breaks upon the very first call to the CUDA API, at which point you are expected to launch
235
- the Nsight Compute GUI utility and attach to the running session:
236
+ the Nsight Compute GUI utility, select ` Interactive Profile ` under ` Activity ` , and attach
237
+ to the running session by selecting it in the list in the ` Attach ` pane:
236
238
237
239
![ "NVIDIA Nsight Compute - Attaching to a session"] ( nsight_compute-attach.png )
238
240
239
- You will see that the tool has stopped execution on the call to ` cuInit ` . Now check
240
- ` Profile > Auto Profile ` to make Nsight Compute gather statistics on our kernels, and clock
241
- ` Debug > Resume ` to resume your session.
241
+ Note that this even works with remote systems, i.e., you can have NSight Compute connect
242
+ over ssh to a remote system where you run Julia under ` ncu ` .
242
243
243
- Now our CLI session comes to life again, and we can enter the rest of our script:
244
+ Once you've successfully attached to a Julia process, you will see that the tool has stopped
245
+ execution on the call to ` cuInit ` . Now check ` Profile > Auto Profile ` to make Nsight Compute
246
+ gather statistics on our kernels, uncheck ` Debug > Break On API Error ` to avoid halting the
247
+ process when innocuous errors happen, and click ` Debug > Resume ` to resume your application.
248
+
249
+ After doing so, our CLI session comes to life again, and we can execute the rest of our script:
244
250
245
251
``` julia
246
252
julia> a = CUDA. rand (1024 ,1024 ,1024 );
@@ -254,6 +260,12 @@ Once that's finished, the Nsight Compute GUI window will have plenty details on
254
260
255
261
![ "NVIDIA Nsight Compute - Kernel profiling"] ( nsight_compute-kernel.png )
256
262
263
+ By default, this only collects a basic set of metrics. If you need more information on a
264
+ specific kernel, select ` detailed ` or ` full ` in the ` Metric Selection ` pane and re-run
265
+ your kernels. Note that collecting more metrics is also more expensive, sometimes even
266
+ requiring multiple executions of your kernel. As such, it is recommended to only collect
267
+ basic metrics by default, and only detailed or full metrics for kernels of interest.
268
+
257
269
At any point in time, you can also pause your application from the debug menu, and inspect
258
270
the API calls that have been made:
259
271
0 commit comments