labcaesar
diff --git a/‎LICENSE
Lines changed: 24 additions & 17 deletions b/‎LICENSE
Lines changed: 24 additions & 17 deletions
diff --git a/‎README.md
Lines changed: 114 additions & 0 deletions b/‎README.md
Lines changed: 114 additions & 0 deletions
diff --git a/‎helper/communication.py
Lines changed: 204 additions & 0 deletions b/‎helper/communication.py
Lines changed: 204 additions & 0 deletions
@@ -1,21 +1,28 @@
-MIT License
+BSD 3-Clause License
 
-Copyright (c) 2025 CAESAR Laboratory - Queen's University
+Copyright (c) 2024-2025, Ethan Shama, Queen's University
 
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
 
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
 
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,114 @@
+# NSYS Analyzer & Visualizer (NAV)
+
+## Introduction  
+High-performance computing (HPC) and AI workloads increasingly rely on **GPUs** for acceleration, yet **understanding performance impacts** of code changes remains challenging. **NVIDIA Nsight™ Systems (NSYS)** provides profiling tools, but lacks **efficient comparative analysis and detailed visualization** capabilities.  
+
+**NAV (NSYS Analyzer and Visualizer)** enhances NSYS by offering **fast, automated, and insightful trace analysis**, helping developers and researchers quickly identify **performance regressions, bottlenecks, and optimizations** in GPU workloads.  
+
+## Key Features  
+✔ **Faster Extraction & Visualization** – Extracts trace data significantly faster (compared to NSYS recipes), **23.33× speedup** for 1.2G traces, **9.75× speedup** for 12G traces  
+✔ **Comparative Analysis** – Enables **direct side-by-side performance comparisons** of multiple traces  
+✔ **Advanced Data Representations** – Generates **histograms, violin plots, and multi-trace visualizations**  
+✔ **Multi-Level Granularity** – Supports **Micro, Meso, and Macro-level** insights for deeper analysis  
+✔ **Efficient Handling of Large Traces** – Uses **parallel processing** to manage high-frequency GPU traces  
+✔ **Multiple Export Formats** – Save results in **CSV, LaTeX, and PNG** for easy reporting and integration  
+✔ **Open-Source & Extensible** – Modify and extend NAV to **add new metrics, visualizations, and analyses** 
+
+## Why Use NAV?  
+🔹 **Automates** performance trace analysis, reducing manual effort  
+🔹 **Uncovers hidden performance trends** that NSYS recipes may miss  
+🔹 **Improves regression testing** by providing **intuitive, side-by-side comparisons**  
+🔹 **Optimized for HPC, AI/ML, and GPU-intensive applications**  
+
+---
+
+## Requirements  
+
+### Required Trace Flags  
+Run NSYS with the necessary flags for full trace capture:  
+```bash
+nsys profile --trace=cuda,mpi,ucx,nvtx 
+```
+
+### Extracting SQLite File from NSYS Report  
+Convert an `.nsys-rep` file to an `.sqlite` database for NAV:  
+```bash
+nsys export --type sqlite <nsys.rep file>
+```
+Alternatively, opening the `.nsys-rep` file in the Nsight GUI may automatically generate an `.sqlite` file.  
+
+### Required Python Libraries  
+Ensure your environment has all dependencies installed:  
+```bash
+pip install absl_py contourpy cycler fonttools joblib kiwisolver \
+matplotlib numpy packaging pillow pyparsing python_dateutil \
+scikit_learn scipy six sklearn threadpoolctl
+```
+
+### **Pre-Compile Scripts for Faster Execution**  
+Before running NAV, you can **precompile** the Python scripts to speed up future executions:  
+```bash
+python -m compileall .
+```
+---
+
+## Script Usage  
+
+### Generating NAV JSON Files from SQLite  
+Extract data and generate tables/figures from an `.sqlite` trace file:  
+```bash
+python3 main.py -df file.sqlite
+```
+Extract data **without** generating tables/figures (useful for batch processing):  
+```bash
+python3 main.py -df file.sqlite -nmo
+```
+Extract data from multiple `.sqlite` files sequentially (**not recommended due to slow performance**):  
+```bash
+python3 main.py -df "file1.sqlite file2.sqlite file3.sqlite" -mdl "Label1,Label2,Label3"
+```
+
+### Recommended: Parallel Extraction for Multiple SQLite Files  
+Run extractions separately to speed up processing:  
+```bash
+# Execute on separate nodes or jobs in parallel
+python3 main.py -df "file1.sqlite" -nmo &
+python3 main.py -df "file2.sqlite" -nmo &
+python3 main.py -df "file3.sqlite" -nmo &
+```
+
+### Generating Tables and Figures from NAV Files  
+Process a single NAV file:  
+```bash
+python3 main.py -jf file.nav
+```
+Process multiple NAV files with comparative analysis:  
+```bash
+python3 main.py -jf "file1.nav file2.nav file3.nav" -mdl "Label1,Label2,Label3"
+```
+
+---
+
+## Flags Overview  
+
+### General Flags  
+- `-o, --output_dir` → Output directory for NAV files, tables, and figures *(default: ./output)*  
+- `-mdl, --multi_data_label` → *(Required for multi-file analysis)* Labels for each trace *(e.g., "1 GPU, 2 GPU, 3 GPU")*  
+- `-mw, --max_workers` → Number of threads to use *(Defaults to CPU count if unset)*  
+
+### Extraction Flags  
+- `-df, --data_file` → Specify an `.sqlite` trace file for extraction  
+- `-nf, --nav_file` → Use an existing NAV `.nav` file instead of extracting from `.sqlite`  
+- `-nkm, --no_kernel_metrics` → Skip exporting kernel metrics  
+- `-ntm, --no_transfer_metrics` → Skip exporting transfer metrics  
+- `-ncm, --no_communication_metrics` → Skip exporting communication metrics  
+- `-nsd, --no_save_data` → Prevent saving extracted data to a NAV file  
+
+### Graphics & Table Flags  
+- `-nmo, --no_metrics_output` → Disable metrics export after extraction  
+- `-ncmo, --no_compare_metrics_output` → Disable comparison metric exports (for multi-file analysis)  
+- `-ngmo, --no_general_metrics_output` → Disable general metric exports (Kernel, Transfer, Communication)  
+- `-nsmo, --no_specific_metrics_output` → Disable specific metric exports (Duration, Size, Slack, Overhead, etc.)  
+- `-nimo, --no_individual_metrics_output` → Disable exporting individual metric details  
+
+---
@@ -0,0 +1,204 @@
+from concurrent.futures import ProcessPoolExecutor, as_completed
+
+from absl import logging
+
+from helper.general import generate_statistics, MAX_WORKERS, create_histogram, remove_outliers
+
+QUERY_COMMUNICATION = """
+WITH
+    domains AS (
+        SELECT
+            min(start),
+            domainId AS id,
+            globalTid AS globalTid,
+            text AS name
+        FROM
+            NVTX_EVENTS
+        WHERE
+            eventType == 75
+        GROUP BY 2, 3
+    ),
+    maxts AS(
+        SELECT max(max(start), max(end)) AS m
+        FROM   NVTX_EVENTS
+    ),
+    nvtx AS (
+        SELECT
+            coalesce(ne.end, (SELECT m FROM maxts)) - ne.start AS duration,
+            CASE
+                WHEN d.name NOT NULL AND sid.value IS NOT NULL
+                    THEN d.name || ':' || sid.value
+                WHEN d.name NOT NULL AND sid.value IS NULL
+                    THEN d.name || ':' || ne.text
+                WHEN d.name IS NULL AND sid.value NOT NULL
+                    THEN sid.value
+                ELSE ne.text
+            END AS tag,
+            CASE ne.eventType
+                WHEN 59
+                    THEN 'PushPop'
+                WHEN 60
+                    THEN 'StartEnd'
+                WHEN 70
+                    THEN 'PushPop'
+                WHEN 71
+                    THEN 'StartEnd'
+                ELSE 'Unknown'
+            END AS style
+        FROM
+            NVTX_EVENTS AS ne
+        LEFT OUTER JOIN
+            domains AS d
+            ON ne.domainId == d.id
+                AND (ne.globalTid & 0x0000FFFFFF000000) == (d.globalTid & 0x0000FFFFFF000000)
+        LEFT OUTER JOIN
+            StringIds AS sid
+            ON ne.textId == sid.id
+        WHERE
+            ne.eventType == 59
+            OR
+            ne.eventType == 60
+            OR
+            ne.eventType == 70
+            OR
+            ne.eventType == 71
+    ),
+    summary AS (
+        SELECT
+            tag AS name,
+            style AS style,
+            sum(duration) AS total,
+            count(*) AS num
+        FROM
+            nvtx
+        GROUP BY 1, 2
+    ),
+    totals AS (
+        SELECT sum(total) AS total
+        FROM summary
+    )
+    SELECT
+        name AS "Name",
+        round(total * 100.0 / (SELECT total FROM totals), 1) AS "Time:ratio_%",
+        total AS "Total Time:dur_ns",
+        num AS "Instances"
+    FROM
+        summary
+    ORDER BY 2 DESC
+"""
+
+QUERY_COMMUNICATION_STATS = """ 
+WITH
+    max_times AS (
+        SELECT MAX(start) AS max_start, MAX(end) AS max_end
+        FROM NVTX_EVENTS
+    ),
+    nvtx AS (
+        SELECT
+            COALESCE(ne.end, (SELECT max_end FROM max_times)) - ne.start AS duration,
+            CASE
+                WHEN d.name IS NOT NULL AND sid.value IS NOT NULL THEN d.name || ':' || sid.value
+                WHEN d.name IS NOT NULL AND sid.value IS NULL THEN d.name || ':' || ne.text
+                WHEN d.name IS NULL AND sid.value IS NOT NULL THEN sid.value
+                ELSE ne.text
+            END AS tag
+        FROM
+            NVTX_EVENTS AS ne
+        LEFT OUTER JOIN
+            (
+                SELECT
+                    MIN(start) AS min_start,
+                    domainId AS id,
+                    globalTid AS globalTid,
+                    text AS name
+                FROM
+                    NVTX_EVENTS
+                WHERE
+                    eventType = 75
+                GROUP BY
+                    domainId, globalTid, text
+            ) AS d
+        ON
+            ne.domainId = d.id
+            AND (ne.globalTid & 0x0000FFFFFF000000) = (d.globalTid & 0x0000FFFFFF000000)
+        LEFT OUTER JOIN
+            StringIds AS sid
+        ON
+            ne.textId = sid.id
+        WHERE
+            ne.eventType IN (59, 60, 70, 71)
+    )
+SELECT
+    tag AS "Name",
+    duration AS "Duration:dur_ns"
+FROM
+    nvtx
+WHERE
+    name = ?
+"""
+
+COMM_REQUIRED_TABLES = ['NVTX_EVENTS', 'StringIds']
+
+
+def generate_communicaiton_stats(comm):
+    durations = [dur[1] for dur in comm[1]]
+    label = comm[0]
+    dict = {}
+
+    if durations and label:
+        dict[label] = generate_statistics(durations, 'Execution Duration')
+        histogram_data = create_histogram( durations, bins=10, powers_2=False, base=False,
+                                           convert_bytes=False, return_bins=False )
+        dict[label]['Execution Duration']['Distribution'] = histogram_data
+    else:
+        dict[label] = None
+
+    return label, dict[label]
+
+
+def parallel_parse_communication_data(queries_res):
+    total_tasks = len(queries_res)
+    completed_tasks = 0
+
+    with ProcessPoolExecutor(max_workers=MAX_WORKERS) as executor:
+        futures = []
+        for data in queries_res:
+            future = executor.submit(generate_communicaiton_stats, data)
+            futures.append(future)
+
+        results = []
+        for future in as_completed(futures):
+            results.append(future.result())
+            completed_tasks += 1
+
+            # Log progress every 10%
+            if int((completed_tasks / total_tasks) * 100 % 10) == 0:
+                logging.info(f"Progress: {(completed_tasks / total_tasks) * 100:.1f}%")
+
+    return results
+
+
+def create_specific_communication_stats(comm_stats, handle_outliers=False):
+    dict = {}
+    cluster_data = []
+    combined_raw_data = []
+
+    for kernel_id, kernel_info in comm_stats.items():
+        if kernel_info["Execution Duration"]:
+            if kernel_info["Execution Duration"]["Raw Data"]:
+                combined_raw_data.extend(kernel_info["Execution Duration"]["Raw Data"])
+            if kernel_info["Execution Duration"]['Mean'] and kernel_info["Execution Duration"]['Median'] and \
+                    kernel_info[
+                        "Instance"]:
+                cluster_data.append(
+                    [kernel_info["Execution Duration"]['Mean'], kernel_info["Execution Duration"]['Median'],
+                     kernel_info["Instance"]])
+
+    if handle_outliers and cluster_data: cluster_data = remove_outliers(cluster_data)
+    if combined_raw_data:
+        dict.update(generate_statistics(combined_raw_data, "Execution Duration", disable_raw=True))
+        dict["Execution Duration"]['Distribution'] = create_histogram(combined_raw_data)
+    if cluster_data:
+        dict["Execution Duration"]['k-mean'] = {'Raw Data': cluster_data}
+
+    return dict