Skip to content

Commit 7bff352

Browse files
authored
NAV v1.1!
1 parent f936052 commit 7bff352

File tree

11 files changed

+2382
-17
lines changed

11 files changed

+2382
-17
lines changed

LICENSE

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,28 @@
1-
MIT License
1+
BSD 3-Clause License
22

3-
Copyright (c) 2025 CAESAR Laboratory - Queen's University
3+
Copyright (c) 2024-2025, Ethan Shama, Queen's University
44

5-
Permission is hereby granted, free of charge, to any person obtaining a copy
6-
of this software and associated documentation files (the "Software"), to deal
7-
in the Software without restriction, including without limitation the rights
8-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9-
copies of the Software, and to permit persons to whom the Software is
10-
furnished to do so, subject to the following conditions:
5+
Redistribution and use in source and binary forms, with or without
6+
modification, are permitted provided that the following conditions are met:
117

12-
The above copyright notice and this permission notice shall be included in all
13-
copies or substantial portions of the Software.
8+
1. Redistributions of source code must retain the above copyright notice, this
9+
list of conditions and the following disclaimer.
1410

15-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21-
SOFTWARE.
11+
2. Redistributions in binary form must reproduce the above copyright notice,
12+
this list of conditions and the following disclaimer in the documentation
13+
and/or other materials provided with the distribution.
14+
15+
3. Neither the name of the copyright holder nor the names of its
16+
contributors may be used to endorse or promote products derived from
17+
this software without specific prior written permission.
18+
19+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# NSYS Analyzer & Visualizer (NAV)
2+
3+
## Introduction
4+
High-performance computing (HPC) and AI workloads increasingly rely on **GPUs** for acceleration, yet **understanding performance impacts** of code changes remains challenging. **NVIDIA Nsight™ Systems (NSYS)** provides profiling tools, but lacks **efficient comparative analysis and detailed visualization** capabilities.
5+
6+
**NAV (NSYS Analyzer and Visualizer)** enhances NSYS by offering **fast, automated, and insightful trace analysis**, helping developers and researchers quickly identify **performance regressions, bottlenecks, and optimizations** in GPU workloads.
7+
8+
## Key Features
9+
**Faster Extraction & Visualization** – Extracts trace data significantly faster (compared to NSYS recipes), **23.33× speedup** for 1.2G traces, **9.75× speedup** for 12G traces
10+
**Comparative Analysis** – Enables **direct side-by-side performance comparisons** of multiple traces
11+
**Advanced Data Representations** – Generates **histograms, violin plots, and multi-trace visualizations**
12+
**Multi-Level Granularity** – Supports **Micro, Meso, and Macro-level** insights for deeper analysis
13+
**Efficient Handling of Large Traces** – Uses **parallel processing** to manage high-frequency GPU traces
14+
**Multiple Export Formats** – Save results in **CSV, LaTeX, and PNG** for easy reporting and integration
15+
**Open-Source & Extensible** – Modify and extend NAV to **add new metrics, visualizations, and analyses**
16+
17+
## Why Use NAV?
18+
🔹 **Automates** performance trace analysis, reducing manual effort
19+
🔹 **Uncovers hidden performance trends** that NSYS recipes may miss
20+
🔹 **Improves regression testing** by providing **intuitive, side-by-side comparisons**
21+
🔹 **Optimized for HPC, AI/ML, and GPU-intensive applications**
22+
23+
---
24+
25+
## Requirements
26+
27+
### Required Trace Flags
28+
Run NSYS with the necessary flags for full trace capture:
29+
```bash
30+
nsys profile --trace=cuda,mpi,ucx,nvtx
31+
```
32+
33+
### Extracting SQLite File from NSYS Report
34+
Convert an `.nsys-rep` file to an `.sqlite` database for NAV:
35+
```bash
36+
nsys export --type sqlite <nsys.rep file>
37+
```
38+
Alternatively, opening the `.nsys-rep` file in the Nsight GUI may automatically generate an `.sqlite` file.
39+
40+
### Required Python Libraries
41+
Ensure your environment has all dependencies installed:
42+
```bash
43+
pip install absl_py contourpy cycler fonttools joblib kiwisolver \
44+
matplotlib numpy packaging pillow pyparsing python_dateutil \
45+
scikit_learn scipy six sklearn threadpoolctl
46+
```
47+
48+
### **Pre-Compile Scripts for Faster Execution**
49+
Before running NAV, you can **precompile** the Python scripts to speed up future executions:
50+
```bash
51+
python -m compileall .
52+
```
53+
---
54+
55+
## Script Usage
56+
57+
### Generating NAV JSON Files from SQLite
58+
Extract data and generate tables/figures from an `.sqlite` trace file:
59+
```bash
60+
python3 main.py -df file.sqlite
61+
```
62+
Extract data **without** generating tables/figures (useful for batch processing):
63+
```bash
64+
python3 main.py -df file.sqlite -nmo
65+
```
66+
Extract data from multiple `.sqlite` files sequentially (**not recommended due to slow performance**):
67+
```bash
68+
python3 main.py -df "file1.sqlite file2.sqlite file3.sqlite" -mdl "Label1,Label2,Label3"
69+
```
70+
71+
### Recommended: Parallel Extraction for Multiple SQLite Files
72+
Run extractions separately to speed up processing:
73+
```bash
74+
# Execute on separate nodes or jobs in parallel
75+
python3 main.py -df "file1.sqlite" -nmo &
76+
python3 main.py -df "file2.sqlite" -nmo &
77+
python3 main.py -df "file3.sqlite" -nmo &
78+
```
79+
80+
### Generating Tables and Figures from NAV Files
81+
Process a single NAV file:
82+
```bash
83+
python3 main.py -jf file.nav
84+
```
85+
Process multiple NAV files with comparative analysis:
86+
```bash
87+
python3 main.py -jf "file1.nav file2.nav file3.nav" -mdl "Label1,Label2,Label3"
88+
```
89+
90+
---
91+
92+
## Flags Overview
93+
94+
### General Flags
95+
- `-o, --output_dir` → Output directory for NAV files, tables, and figures *(default: ./output)*
96+
- `-mdl, --multi_data_label`*(Required for multi-file analysis)* Labels for each trace *(e.g., "1 GPU, 2 GPU, 3 GPU")*
97+
- `-mw, --max_workers` → Number of threads to use *(Defaults to CPU count if unset)*
98+
99+
### Extraction Flags
100+
- `-df, --data_file` → Specify an `.sqlite` trace file for extraction
101+
- `-nf, --nav_file` → Use an existing NAV `.nav` file instead of extracting from `.sqlite`
102+
- `-nkm, --no_kernel_metrics` → Skip exporting kernel metrics
103+
- `-ntm, --no_transfer_metrics` → Skip exporting transfer metrics
104+
- `-ncm, --no_communication_metrics` → Skip exporting communication metrics
105+
- `-nsd, --no_save_data` → Prevent saving extracted data to a NAV file
106+
107+
### Graphics & Table Flags
108+
- `-nmo, --no_metrics_output` → Disable metrics export after extraction
109+
- `-ncmo, --no_compare_metrics_output` → Disable comparison metric exports (for multi-file analysis)
110+
- `-ngmo, --no_general_metrics_output` → Disable general metric exports (Kernel, Transfer, Communication)
111+
- `-nsmo, --no_specific_metrics_output` → Disable specific metric exports (Duration, Size, Slack, Overhead, etc.)
112+
- `-nimo, --no_individual_metrics_output` → Disable exporting individual metric details
113+
114+
---

helper/communication.py

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
from concurrent.futures import ProcessPoolExecutor, as_completed
2+
3+
from absl import logging
4+
5+
from helper.general import generate_statistics, MAX_WORKERS, create_histogram, remove_outliers
6+
7+
QUERY_COMMUNICATION = """
8+
WITH
9+
domains AS (
10+
SELECT
11+
min(start),
12+
domainId AS id,
13+
globalTid AS globalTid,
14+
text AS name
15+
FROM
16+
NVTX_EVENTS
17+
WHERE
18+
eventType == 75
19+
GROUP BY 2, 3
20+
),
21+
maxts AS(
22+
SELECT max(max(start), max(end)) AS m
23+
FROM NVTX_EVENTS
24+
),
25+
nvtx AS (
26+
SELECT
27+
coalesce(ne.end, (SELECT m FROM maxts)) - ne.start AS duration,
28+
CASE
29+
WHEN d.name NOT NULL AND sid.value IS NOT NULL
30+
THEN d.name || ':' || sid.value
31+
WHEN d.name NOT NULL AND sid.value IS NULL
32+
THEN d.name || ':' || ne.text
33+
WHEN d.name IS NULL AND sid.value NOT NULL
34+
THEN sid.value
35+
ELSE ne.text
36+
END AS tag,
37+
CASE ne.eventType
38+
WHEN 59
39+
THEN 'PushPop'
40+
WHEN 60
41+
THEN 'StartEnd'
42+
WHEN 70
43+
THEN 'PushPop'
44+
WHEN 71
45+
THEN 'StartEnd'
46+
ELSE 'Unknown'
47+
END AS style
48+
FROM
49+
NVTX_EVENTS AS ne
50+
LEFT OUTER JOIN
51+
domains AS d
52+
ON ne.domainId == d.id
53+
AND (ne.globalTid & 0x0000FFFFFF000000) == (d.globalTid & 0x0000FFFFFF000000)
54+
LEFT OUTER JOIN
55+
StringIds AS sid
56+
ON ne.textId == sid.id
57+
WHERE
58+
ne.eventType == 59
59+
OR
60+
ne.eventType == 60
61+
OR
62+
ne.eventType == 70
63+
OR
64+
ne.eventType == 71
65+
),
66+
summary AS (
67+
SELECT
68+
tag AS name,
69+
style AS style,
70+
sum(duration) AS total,
71+
count(*) AS num
72+
FROM
73+
nvtx
74+
GROUP BY 1, 2
75+
),
76+
totals AS (
77+
SELECT sum(total) AS total
78+
FROM summary
79+
)
80+
SELECT
81+
name AS "Name",
82+
round(total * 100.0 / (SELECT total FROM totals), 1) AS "Time:ratio_%",
83+
total AS "Total Time:dur_ns",
84+
num AS "Instances"
85+
FROM
86+
summary
87+
ORDER BY 2 DESC
88+
"""
89+
90+
QUERY_COMMUNICATION_STATS = """
91+
WITH
92+
max_times AS (
93+
SELECT MAX(start) AS max_start, MAX(end) AS max_end
94+
FROM NVTX_EVENTS
95+
),
96+
nvtx AS (
97+
SELECT
98+
COALESCE(ne.end, (SELECT max_end FROM max_times)) - ne.start AS duration,
99+
CASE
100+
WHEN d.name IS NOT NULL AND sid.value IS NOT NULL THEN d.name || ':' || sid.value
101+
WHEN d.name IS NOT NULL AND sid.value IS NULL THEN d.name || ':' || ne.text
102+
WHEN d.name IS NULL AND sid.value IS NOT NULL THEN sid.value
103+
ELSE ne.text
104+
END AS tag
105+
FROM
106+
NVTX_EVENTS AS ne
107+
LEFT OUTER JOIN
108+
(
109+
SELECT
110+
MIN(start) AS min_start,
111+
domainId AS id,
112+
globalTid AS globalTid,
113+
text AS name
114+
FROM
115+
NVTX_EVENTS
116+
WHERE
117+
eventType = 75
118+
GROUP BY
119+
domainId, globalTid, text
120+
) AS d
121+
ON
122+
ne.domainId = d.id
123+
AND (ne.globalTid & 0x0000FFFFFF000000) = (d.globalTid & 0x0000FFFFFF000000)
124+
LEFT OUTER JOIN
125+
StringIds AS sid
126+
ON
127+
ne.textId = sid.id
128+
WHERE
129+
ne.eventType IN (59, 60, 70, 71)
130+
)
131+
SELECT
132+
tag AS "Name",
133+
duration AS "Duration:dur_ns"
134+
FROM
135+
nvtx
136+
WHERE
137+
name = ?
138+
"""
139+
140+
COMM_REQUIRED_TABLES = ['NVTX_EVENTS', 'StringIds']
141+
142+
143+
def generate_communicaiton_stats(comm):
144+
durations = [dur[1] for dur in comm[1]]
145+
label = comm[0]
146+
dict = {}
147+
148+
if durations and label:
149+
dict[label] = generate_statistics(durations, 'Execution Duration')
150+
histogram_data = create_histogram( durations, bins=10, powers_2=False, base=False,
151+
convert_bytes=False, return_bins=False )
152+
dict[label]['Execution Duration']['Distribution'] = histogram_data
153+
else:
154+
dict[label] = None
155+
156+
return label, dict[label]
157+
158+
159+
def parallel_parse_communication_data(queries_res):
160+
total_tasks = len(queries_res)
161+
completed_tasks = 0
162+
163+
with ProcessPoolExecutor(max_workers=MAX_WORKERS) as executor:
164+
futures = []
165+
for data in queries_res:
166+
future = executor.submit(generate_communicaiton_stats, data)
167+
futures.append(future)
168+
169+
results = []
170+
for future in as_completed(futures):
171+
results.append(future.result())
172+
completed_tasks += 1
173+
174+
# Log progress every 10%
175+
if int((completed_tasks / total_tasks) * 100 % 10) == 0:
176+
logging.info(f"Progress: {(completed_tasks / total_tasks) * 100:.1f}%")
177+
178+
return results
179+
180+
181+
def create_specific_communication_stats(comm_stats, handle_outliers=False):
182+
dict = {}
183+
cluster_data = []
184+
combined_raw_data = []
185+
186+
for kernel_id, kernel_info in comm_stats.items():
187+
if kernel_info["Execution Duration"]:
188+
if kernel_info["Execution Duration"]["Raw Data"]:
189+
combined_raw_data.extend(kernel_info["Execution Duration"]["Raw Data"])
190+
if kernel_info["Execution Duration"]['Mean'] and kernel_info["Execution Duration"]['Median'] and \
191+
kernel_info[
192+
"Instance"]:
193+
cluster_data.append(
194+
[kernel_info["Execution Duration"]['Mean'], kernel_info["Execution Duration"]['Median'],
195+
kernel_info["Instance"]])
196+
197+
if handle_outliers and cluster_data: cluster_data = remove_outliers(cluster_data)
198+
if combined_raw_data:
199+
dict.update(generate_statistics(combined_raw_data, "Execution Duration", disable_raw=True))
200+
dict["Execution Duration"]['Distribution'] = create_histogram(combined_raw_data)
201+
if cluster_data:
202+
dict["Execution Duration"]['k-mean'] = {'Raw Data': cluster_data}
203+
204+
return dict

0 commit comments

Comments
 (0)