Skip to content

[UR][Benchmarks] Support for Unitrace for all benchmarks #19320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: sycl
Choose a base branch
from

Conversation

mateuszpn
Copy link
Contributor

Run main.py script with --unitrace to get unitrace logs for every benchmark.
Use --unitrace-inclusive to get benchmark results and unitrace logs.

Signed-off-by: Mateusz P. Nowak <mateusz.p.nowak@intel.com>
Signed-off-by: Mateusz P. Nowak <mateusz.p.nowak@intel.com>
@mateuszpn mateuszpn requested a review from a team as a code owner July 7, 2025 09:03
@mateuszpn mateuszpn requested a review from Copilot July 7, 2025 09:17
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for Unitrace tracing in all benchmarks by introducing new CLI flags, integrating Unitrace setup/teardown logic, and extending benchmark run methods.

  • Introduce --unitrace and --unitrace-inclusive flags and related options
  • Add unitrace.py utilities (prepare, cleanup, handle, prune)
  • Update run_bench and individual benchmarks to accept unitrace_timestamp and skip exclusions

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
devops/scripts/benchmarks/utils/unitrace.py New module for Unitrace log handling
devops/scripts/benchmarks/options.py Added CLI options for Unitrace and save_name
devops/scripts/benchmarks/main.py Integrated Unitrace flags into workflow and CLI
devops/scripts/benchmarks/html/data.js Removed trailing whitespace
devops/scripts/benchmarks/history.py Updated save signature to include timestamp
devops/scripts/benchmarks/benches/velocity.py Extended run to forward unitrace_timestamp
devops/scripts/benchmarks/benches/umf.py Extended run to forward unitrace_timestamp
devops/scripts/benchmarks/benches/test.py Extended run to forward unitrace_timestamp
devops/scripts/benchmarks/benches/syclbench.py Extended run to forward unitrace_timestamp
devops/scripts/benchmarks/benches/llamacpp.py Extended run to forward unitrace_timestamp
devops/scripts/benchmarks/benches/gromacs.py Extended run to forward unitrace_timestamp
devops/scripts/benchmarks/benches/compute.py Added Unitrace exclusion list and timestamp logic
devops/scripts/benchmarks/benches/benchdnn_list.py Defined unitrace_exclusion_list for oneDNN bench
devops/scripts/benchmarks/benches/benchdnn.py Integrated Unitrace options for oneDNN bench
devops/scripts/benchmarks/benches/base.py Updated run_bench signature to add Unitrace hooks
Comments suppressed due to low confidence (2)

devops/scripts/benchmarks/utils/unitrace.py:26

  • [nitpick] Consider adding unit tests for prune_unitrace_dirs (and other helper functions) to verify that old directories are removed correctly and edge cases are handled.
def prune_unitrace_dirs(base_dir, FILECNT=10):

devops/scripts/benchmarks/benches/compute.py:229

  • [nitpick] This exclusion list is duplicated in benchdnn_list.py; consider centralizing it in one module to avoid inconsistencies.
    unitrace_exclusion_list = [

from utils.validate import Validate
from utils.detect_versions import DetectVersion
from presets import enabled_suites, presets
from utils.oneapi import get_oneapi
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import get_oneapi is not used in this file; consider removing it to avoid unused imports.

Suggested change
from utils.oneapi import get_oneapi

Copilot uses AI. Check for mistakes.

Comment on lines +71 to +74
bench_dir = f"{options.unitrace_res_dir}/{options.save_name}_{unitrace_timestamp}"
os.makedirs(bench_dir, exist_ok=True)

unitrace_output = f"{bench_dir}/{name}_{unitrace_timestamp}"
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Constructing file paths with hardcoded / can break on some platforms; consider using os.path.join(options.unitrace_res_dir, f"{options.save_name}_{unitrace_timestamp}") for portability.

Suggested change
bench_dir = f"{options.unitrace_res_dir}/{options.save_name}_{unitrace_timestamp}"
os.makedirs(bench_dir, exist_ok=True)
unitrace_output = f"{bench_dir}/{name}_{unitrace_timestamp}"
bench_dir = os.path.join(options.unitrace_res_dir, f"{options.save_name}_{unitrace_timestamp}")
os.makedirs(bench_dir, exist_ok=True)
unitrace_output = os.path.join(bench_dir, f"{name}_{unitrace_timestamp}")

Copilot uses AI. Check for mistakes.

parser.add_argument(
"--unitrace",
action="store_true",
help="Unitrace tracing for sigle iteration of benchmarks",
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in help text: "sigle" should be "single".

Suggested change
help="Unitrace tracing for sigle iteration of benchmarks",
help="Unitrace tracing for single iteration of benchmarks",

Copilot uses AI. Check for mistakes.

@@ -86,7 +87,14 @@ def get_adapter_full_path():
), f"could not find adapter file {adapter_path} (and in similar lib paths)"

def run_bench(
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Avoid mutable default arguments like extra_unitrace_opt=[]; use None and set to an empty list inside the function to prevent unintended sharing.

Copilot uses AI. Check for mistakes.

"-DBUILD_WITH_XPTI=1",
"-DBUILD_WITH_MPI=0",
],
ld_library=get_oneapi().ld_libraries() + [f"{options.sycl}/lib"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

@@ -0,0 +1,182 @@
# Copyright (C) 2024-2025 Intel Corporation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025

print(f"Moved {pid_json_files[0]} to {dst}")

# Prune old unitrace directories
prune_unitrace_dirs(options.unitrace_res_dir, FILECNT=5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use this only here. Why bother with defining a default value of an argument that you are going to override anyway?


shutil.move(os.path.join(options.benchmark_cwd, pid_json_files[0]), dst)
if options.verbose:
print(f"Moved {pid_json_files[0]} to {dst}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the logs will need to be changed after #19158 is merged.

return bench_dir, unitrace_output, unitrace_command


def handle_unitrace_output(bench_dir, unitrace_output, timestamp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is all very complicated. Maybe we can use directory structure to help us?

Let's say we have two benchmarks: "SubmitKernel" and "Hashtable". We can organize the unitrace output such that each trace file is in its own directory:

.../results/traces/SubmitKernel/YYYYMMDD_HHMMSS_[name].PID.out
.../results/traces/Hashtable/YYYYMMDD_HHMMSS_[name].PID.out

Removing old results would become simply removing all but 5 newest files in a directory. We wouldn't have to find, rename and move files, since we could make unitrace output them to the destination place upfront.

Comment on lines +547 to +557
parser.add_argument(
"--unitrace",
action="store_true",
help="Unitrace tracing for sigle iteration of benchmarks",
)

parser.add_argument(
"--unitrace-inclusive",
action="store_true",
help="Regular run of benchmarks iterations and unitrace tracing in single additional run",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do --unitrace inclusive and --unitrace as a single option, not two separate ones. See how --output-html is done.

@@ -224,6 +224,71 @@ def parse_unit_type(compute_unit):


class ComputeBenchmark(Benchmark):

# list of benchmarks to exclude from unitrace due to SIGSEGV, SIGABRT or timeouts
unitrace_exclusion_list = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just let things fail. If unitrace is failing, it's an urgent bug that needs to be fixed.

Signed-off-by: Mateusz P. Nowak <mateusz.p.nowak@intel.com>
from utils.oneapi import get_oneapi


def extract_save_name_and_timestamp(dirname):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used only in this module, please rename this method with an underscore prefix to make it clearer that it's for an internal use of this module's methods

return None, None


def prune_unitrace_dirs(base_dir, FILECNT=10):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used only in this module, please rename this method with an underscore prefix to make it clearer that it's for an internal use of this module's methods


# It's important we don't save the current results into history before
# we calculate historical averages or get latest results for compare.
# Otherwise we might be comparing the results to themselves.
if not options.dry_run:
history.save(saved_name, results, save_name is not None)
history.save(saved_name, timestamp, results, options.save_name is not None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now history.save can use options.save_name directly and the last argument can be removed

if options.timestamp_override is None
else options.timestamp_override
)
if timestamp is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the timestamp is unconditionally created in the main() method, so this one is not needed?

cwd=options.benchmark_cwd,
ld_library=ld_libraries,
)
except subprocess.CalledProcessError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e is unused, can be omitted

cwd=options.benchmark_cwd,
ld_library=ld_libraries,
)
if unitrace_timestamp is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unintuitive that passing a unitrace timestamp triggers unitracing. Please, consider making a unitrace class and passing a Unitrace object or None. This would also allow encapsulating unitrace download and setup inside Unitrace's constructor.

@@ -129,7 +129,7 @@ def setup(self):
if not self.bench_bin.exists():
raise FileNotFoundError(f"Benchmark binary not found: {self.bench_bin}")

def run(self, env_vars):
def run(self, env_vars, unitrace_timestamp: str = None) -> list[Result]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a dict type as a type hint to env_vars

prune_unitrace_dirs(options.unitrace_res_dir, FILECNT=5)


def download_and_build_unitrace(workdir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options.workdir can be used directly, no need to pass an argument

add_sycl=True,
)
run(
["cmake", "--build", build_dir, "-j"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use options.build_jobs as much as we can. Also, there is a potential bug in this option's default value. os.process_cpu_count() shall be used as a default to respect the available cpu count for a given process, not multiprocessing.cpu_count() which returns ALL cpus in the system. This leads to builds using more parallel jobs than available cpus, ie. when limiting cpu count for a benchmark framework with taskset.

ld_library=get_oneapi().ld_libraries() + [f"{options.sycl}/lib"],
add_sycl=True,
)
print("Unitrace built successfully.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the build fails?

print(f"Renamed {src} to {dst}")
break

# Handle {name}.{pid}.json files in cwd: move and rename to {self.name()}_{timestamp}.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't unitrace already have an option for changing the default name of jsons and .pid logs?

raise FileNotFoundError(f"Unitrace binary not found: {unitrace_bin}. ")
os.makedirs(options.unitrace_res_dir, exist_ok=True)
if not options.save_name:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is raised too late. It should be raised before we set up all the benchmarks and try to run them. We already know whether unitrace and save_name has been set right after the arguments parsing.

shutil.rmtree(full_path)


def unitrace_cleanup(bench_cwd, unitrace_output):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use type hints in all the methods

Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general comment - when under tracing, we should reduce the number of iterations etc, to shorten the duration of the benchmark.
If we really need to disable a benchmark (because it's difficult to make it sufficiently short for example), the Benchmark class should have something like traceable() to filter out which benchmarks should / shouldn't be traced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants