-
Notifications
You must be signed in to change notification settings - Fork 109
Nsight Profiling to Phoenix Benchmark Cases #929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds Nsight Systems profiling to the Phoenix benchmarking workflows to allow side-by-side comparison of master vs. PR performance.
- Wraps both GPU and CPU benchmark invocations in
nsys profile
withinbench.sh
. - Processes and prints key profiling metrics (NVTX, CUDA API calls, GPU kernels) in the CI via
bench.yml
. - Uploads the generated
report.nsys-rep
as an artifact for later inspection.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
.github/workflows/phoenix/bench.sh | Prepend nsys profile to the existing benchmark commands |
.github/workflows/bench.yml | Add a “Process Nsight Profiling Report” step and include the report.nsys-rep artifact |
Comments suppressed due to low confidence (4)
.github/workflows/phoenix/bench.sh:19
- Using a fixed output name
report
may cause collisions or overwrite data when running multiple jobs; consider parameterizing the output file (e.g.-o "$job_slug"
) to improve traceability.
nsys profile -o report ./mfc.sh bench --mem 12 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
.github/workflows/phoenix/bench.sh:21
- Same as above: the static
report
filename will be reused for CPU runs—consider including$job_slug
or device name in the output filename to avoid overwrites.
nsys profile -o report ./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
.github/workflows/bench.yml:97
- The workflow checks for
pr/report.nsys-rep
, butbench.sh
emitsreport.nsys-rep
in the workspace root (or TMPDIR). The path should match where the file is actually written or copy the report intopr/
beforehand.
if [ -f "pr/report.nsys-rep" ]; then
.github/workflows/bench.yml:104
- [nitpick] The profiling section hardcodes just three report types—consider looping over the full set of reports listed in the PR description to reduce duplication and ensure all metrics are covered.
echo "=== CUDA API CALLS ==="
PR Code Suggestions ✨Explore these optional code suggestions:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #929 +/- ##
=======================================
Coverage 43.71% 43.71%
=======================================
Files 68 68
Lines 18360 18360
Branches 2292 2292
=======================================
Hits 8026 8026
Misses 8945 8945
Partials 1389 1389 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
User description
Description
Adding this feature to compare next to each other nsys reports of master vs. pr. I left it now for visual comparison.
To compare reports, use
diff
orcsv-diff
after exporting readable files withnsys analyze -f <format e.g. csv, txt, etc.> -o <output-file>
.Nsight Docs: https://docs.nvidia.com/nsight-systems/UserGuide/index.html#report-scripts
Variety of Reports to Display:
nvtx_sum, osrt_sum, cuda_api_sum, cuda_gpu_kern_sum, cuda_gpu_mem_time_sum, cuda_gpu_mem_size_sum, openmp_sum, opengl_khr_range_sum, opengl_khr_gpu_range_sum, vulkan_marker_sum, vulkan_gpu_marker_sum, dx11_pix_sum, dx12_gpu_marker_sum, dx12_pix_sum, wddm_queue_sum, um_sum, um_total_sum, um_cpu_page_faults_sum, openacc_sum
PR Type
Enhancement
Description
Add Nsight Systems profiling to Phoenix benchmark workflow
Generate profiling reports for master and PR branches
Display NVTX, CUDA API, and GPU kernel summaries
Archive profiling reports as workflow artifacts
Changes diagram
Changes walkthrough 📝
bench.sh
Enable Nsight profiling for benchmark execution
.github/workflows/phoenix/bench.sh
nsys profile -o report
commandbench.yml
Add profiling report processing and archival
.github/workflows/bench.yml