Skip to content

Commit 30694a8

Browse files
authored
Update docs on profiling with NSight Compute.
[skip tests]
1 parent 92d86f1 commit 30694a8

File tree

1 file changed

+21
-9
lines changed

1 file changed

+21
-9
lines changed

docs/src/development/profiling.md

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,11 @@ interactions in detail, Nsight Compute is the tool for you. It is again possible
214214
profiler with an interactive session of Julia, and debug or profile only those sections of
215215
your application that are marked with `CUDA.@profile`.
216216

217-
Start with launching Julia under the Nsight Compute CLI tool:
217+
First, ensure that all (CUDA) packages that are involved in your application have been
218+
precompiled. Otherwise, you'll end up profiling the precompilation process, instead of
219+
the process where the actual work happens.
220+
221+
Then, launch Julia under the Nsight Compute CLI tool as follows:
218222

219223
```
220224
$ ncu --mode=launch julia
@@ -224,23 +228,25 @@ You will get an interactive REPL, where you can execute whatever code you want:
224228

225229
```julia
226230
julia> using CUDA
227-
228-
julia> CUDA.driver_version()
229-
230231
# Julia hangs!
231232
```
232233

233234
As soon as you use CUDA.jl, your Julia process will hang. This is expected, as the tool
234235
breaks upon the very first call to the CUDA API, at which point you are expected to launch
235-
the Nsight Compute GUI utility and attach to the running session:
236+
the Nsight Compute GUI utility, select `Interactive Profile` under `Activity`, and attach
237+
to the running session by selecting it in the list in the `Attach` pane:
236238

237239
!["NVIDIA Nsight Compute - Attaching to a session"](nsight_compute-attach.png)
238240

239-
You will see that the tool has stopped execution on the call to `cuInit`. Now check
240-
`Profile > Auto Profile` to make Nsight Compute gather statistics on our kernels, and clock
241-
`Debug > Resume` to resume your session.
241+
Note that this even works with remote systems, i.e., you can have NSight Compute connect
242+
over ssh to a remote system where you run Julia under `ncu`.
242243

243-
Now our CLI session comes to life again, and we can enter the rest of our script:
244+
Once you've successfully attached to a Julia process, you will see that the tool has stopped
245+
execution on the call to `cuInit`. Now check `Profile > Auto Profile` to make Nsight Compute
246+
gather statistics on our kernels, uncheck `Debug > Break On API Error` to avoid halting the
247+
process when innocuous errors happen, and click `Debug > Resume` to resume your application.
248+
249+
After doing so, our CLI session comes to life again, and we can execute the rest of our script:
244250

245251
```julia
246252
julia> a = CUDA.rand(1024,1024,1024);
@@ -254,6 +260,12 @@ Once that's finished, the Nsight Compute GUI window will have plenty details on
254260

255261
!["NVIDIA Nsight Compute - Kernel profiling"](nsight_compute-kernel.png)
256262

263+
By default, this only collects a basic set of metrics. If you need more information on a
264+
specific kernel, select `detailed` or `full` in the `Metric Selection` pane and re-run
265+
your kernels. Note that collecting more metrics is also more expensive, sometimes even
266+
requiring multiple executions of your kernel. As such, it is recommended to only collect
267+
basic metrics by default, and only detailed or full metrics for kernels of interest.
268+
257269
At any point in time, you can also pause your application from the debug menu, and inspect
258270
the API calls that have been made:
259271

0 commit comments

Comments
 (0)