Skip to content

Commit 4cc68c5

Browse files
expand documentation
1 parent 225dae7 commit 4cc68c5

File tree

2 files changed

+19
-5
lines changed

2 files changed

+19
-5
lines changed

doc/source/observers.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,3 +112,11 @@ More information about PMT can be found here: https://git.astron.nl/RD/pmt/
112112

113113

114114

115+
NCUObserver
116+
~~~~~~~~~~~
117+
118+
The NCUObserver can be used to automatically extract performance counters during tuning using Nvidia's NsightCompute profiler.
119+
The NCUObserver relies on an intermediate library, which can be found here: https://github.com/nlesc-recruit/nvmetrics
120+
121+
.. autoclass:: kernel_tuner.observers.ncu.NCUObserver
122+

kernel_tuner/observers/ncu.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,26 @@
44
import nvmetrics
55
except (ImportError):
66
nvmetrics = None
7-
pass
87

98
class NCUObserver(PrologueObserver):
109
"""``NCUObserver`` measures performance counters.
1110
12-
"""
11+
The exact performance counters supported differ per GPU, some examples:
1312
14-
def __init__(self, metrics=None, device=0):
15-
"""Create a new ``NCUObserver``.
13+
* "dram__bytes.sum", # Counter byte # of bytes accessed in DRAM
14+
* "dram__bytes_read.sum", # Counter byte # of bytes read from DRAM
15+
* "dram__bytes_write.sum", # Counter byte # of bytes written to DRAM
16+
* "smsp__sass_thread_inst_executed_op_fadd_pred_on.sum", # Counter inst # of FADD thread instructions executed where all predicates were true
17+
* "smsp__sass_thread_inst_executed_op_ffma_pred_on.sum", # Counter inst # of FFMA thread instructions executed where all predicates were true
18+
* "smsp__sass_thread_inst_executed_op_fmul_pred_on.sum", # Counter inst # of FMUL thread instructions executed where all predicates were true
1619
1720
:param metrics: The metrics to observe. This should be a list of strings.
1821
You can use ``ncu --query-metrics`` to get a list of valid metrics.
19-
"""
22+
:type metrics: list[str]
2023
24+
"""
25+
26+
def __init__(self, metrics=None, device=0):
2127
if not nvmetrics:
2228
print("NCUObserver is not available.")
2329

0 commit comments

Comments
 (0)