Skip to content

Commit 6c4f597

Browse files
authored
[CI][Benchmarks] Add contributing guide (#17995)
1 parent 8cd13a7 commit 6c4f597

File tree

5 files changed

+99
-4
lines changed

5 files changed

+99
-4
lines changed

devops/scripts/benchmarks/CONTRIB.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# SYCL & UR Benchmark Suite Contribution Guide
2+
3+
## Architecture
4+
5+
The suite is structured around three main components: Suites, Benchmarks, and Results.
6+
7+
1. **Suites:**
8+
* Collections of related benchmarks (e.g., `ComputeBench`, `LlamaCppBench`).
9+
* Must implement the `Suite` base class (`benches/base.py`).
10+
* Responsible for shared setup (`setup()`) if needed.
11+
* Provide a list of `Benchmark` instances via `benchmarks()`.
12+
* Define a unique `name()`.
13+
* Can provide additional group-level metadata via `additional_metadata()`.
14+
* This method should return a dictionary mapping string keys to BenchmarkMetadata objects (with type="group").
15+
* The keys in this dictionary are used by the dashboard as prefixes to associate group-level metadata with benchmark results (e.g., "Submit" group key will match both "Submit In Order" and "Submit Out Of Order").
16+
17+
2. **Benchmarks:**
18+
* Represent a single benchmark, usually mapping to a binary execution.
19+
* Must implement the `Benchmark` base class (`benches/base.py`).
20+
* **Required Methods:**
21+
* `setup()`: Initializes the benchmark (e.g., build, download data). Use `self.download()` for data dependencies. **Do not** perform setup in `__init__`.
22+
* `run(env_vars)`: Executes the benchmark binary (use `self.run_bench()`) and returns a list of `Result` objects. Can be called multiple times, must produce consistent results.
23+
* `teardown()`: Cleans up resources. Can be empty. No need to remove build artifacts or downloaded datasets.
24+
* `name()`: Returns a unique identifier string for the benchmark across *all* suites. If a benchmark class is instantiated multiple times with different parameters (e.g., "Submit In Order", "Submit Out Of Order"), the `name()` must reflect this uniqueness.
25+
* **Optional Methods:**
26+
* `lower_is_better()`: Returns `True` if lower result values are better (default: `True`).
27+
* `description()`: Provides a short description about the benchmark.
28+
* `notes()`: Provides additional commentary about the benchmark results (string).
29+
* `unstable()`: If it returns a string reason, the benchmark is hidden by default and marked unstable.
30+
* `get_tags()`: Returns a list of string tags (e.g., "SYCL", "UR", "micro", "application"). See `benches/base.py` for predefined tags.
31+
* `stddev_threshold()`: Returns a custom standard deviation threshold (float) for stability checks, overriding the global default.
32+
* **Helper Methods (Base Class):**
33+
* `run_bench(command, env_vars, ld_library=[], add_sycl=True)`: Executes a command with appropriate environment setup (UR adapter, SYCL paths, extra env vars/libs). Returns stdout.
34+
* `download(name, url, file, ...)`: Downloads and optionally extracts data dependencies into the working directory.
35+
* `create_data_path(name, ...)`: Creates a path for benchmark data dependencies.
36+
* **Metadata:** Benchmarks generate static metadata via `get_metadata()`, which bundles description, notes, tags, etc.
37+
38+
3. **Results:**
39+
* Store information about a single benchmark execution instance. Defined by the `Result` dataclass (`utils/result.py`).
40+
* **Fields (set by Benchmark):**
41+
* `label`: Unique identifier for this *specific result type* within the benchmark instance (e.g., "Submit In Order Time"). Ideally contains `benchmark.name()`.
42+
* `value`: The measured numerical result (float).
43+
* `unit`: The unit of the value (string, e.g., "μs", "GB/s", "token/s").
44+
* `command`: The command list used to run the benchmark (`list[str]`).
45+
* `env`: Environment variables used (`dict[str, str]`).
46+
* `stdout`: Full standard output of the benchmark run (string).
47+
* `passed`: Boolean indicating if verification passed (default: `True`).
48+
* `explicit_group`: Name for grouping results in visualization (string). Benchmarks in the same group are compared in tables/charts. Ensure consistent units and value ranges within a group.
49+
* `stddev`: Standard deviation, if calculated by the benchmark itself (float, default: 0.0).
50+
* `git_url`, `git_hash`: Git info for the benchmark's source code (string).
51+
* **Fields (set by Framework):**
52+
* `name`: Set to `label` by the framework.
53+
* `lower_is_better`: Copied from `benchmark.lower_is_better()`.
54+
* `suite`: Name of the suite the benchmark belongs to.
55+
* `stddev`: Calculated by the framework across iterations if not set by the benchmark.
56+
57+
4. **BenchmarkMetadata:**
58+
* Stores static properties applicable to all results of a benchmark or group. Defined by the `BenchmarkMetadata` dataclass (`utils/result.py`).
59+
* **Fields:**
60+
* `type`: "benchmark" (auto-generated) or "group" (manually specified in `Suite.additional_metadata()`).
61+
* `description`: Displayed description (string).
62+
* `notes`: Optional additional commentary (string).
63+
* `unstable`: Reason if unstable, otherwise `None` (string).
64+
* `tags`: List of associated tags (`list[str]`).
65+
* `range_min`, `range_max`: Optional minimum/maximum value for the Y-axis range in charts. Defaults to `None`, with range determined automatically.
66+
67+
## Adding New Benchmarks
68+
69+
1. **Create Benchmark Class:** Implement a new class inheriting from `benches.base.Benchmark`. Implement required methods (`setup`, `run`, `teardown`, `name`) and optional ones (`description`, `get_tags`, etc.) as needed.
70+
2. **Add to Suite:**
71+
* If adding to an existing category, modify the corresponding `Suite` class (e.g., `benches/compute.py`) to instantiate and return your new benchmark in its `benchmarks()` method.
72+
* If creating a new category, create a new `Suite` class inheriting from `benches.base.Suite`. Implement `name()` and `benchmarks()`. Add necessary `setup()` if the suite requires shared setup. Add group metadata via `additional_metadata()` if needed.
73+
3. **Register Suite:** Import and add your new `Suite` instance to the `suites` list in `main.py`.
74+
4. **Add to Presets:** If adding a new suite, add its `name()` to the relevant lists in `presets.py` (e.g., "Full", "Normal") so it runs with those presets.
75+
76+
## Recommendations
77+
78+
* **Keep benchmarks short:** Ideally under a minute per run. The framework runs benchmarks multiple times (`--iterations`, potentially repeated up to `--iterations-stddev` times).
79+
* **Ensure determinism:** Minimize run-to-run variance. High standard deviation (`> stddev_threshold`) triggers reruns.
80+
* **Handle configuration:** If a benchmark requires specific hardware/software, detect it in `setup()` and potentially skip gracefully if requirements aren't met (e.g., return an empty list from `run` or don't add it in the Suite's `benchmarks()` method).
81+
* **Use unique names:** Ensure `benchmark.name()` and `result.label` are descriptive and unique.
82+
* **Group related results:** Use `result.explicit_group` consistently for results you want to compare directly in outputs. Ensure units match within a group. If defining group-level metadata in the Suite, ensure the chosen explicit_group name starts with the corresponding key defined in additional_metadata.
83+
* **Test locally:** Before submitting changes, test with relevant drivers/backends (e.g., using `--compute-runtime --build-igc` for L0). Check the visualization locally if possible (--output-markdown --output-html, then open the generated files).
84+
85+
## Utilities
86+
87+
* **`utils.utils`:** Provides common helper functions:
88+
* `run()`: Executes shell commands with environment setup (SYCL paths, LD_LIBRARY_PATH).
89+
* `git_clone()`: Clones/updates Git repositories.
90+
* `download()`: Downloads files via HTTP, checks checksums, optionally extracts tar/gz archives.
91+
* `prepare_workdir()`: Sets up the main working directory.
92+
* `create_build_path()`: Creates a clean build directory.
93+
* **`utils.oneapi`:** Provides the `OneAPI` singleton class (`get_oneapi()`). Downloads and installs specified oneAPI components (oneDNN, oneMKL) into the working directory if needed, providing access to their paths (libs, includes, CMake configs). Use this if your benchmark depends on these components instead of requiring a system-wide install.
94+
* **`options.py`:** Defines and holds global configuration options, populated by `argparse` in `main.py`. Use options instead of defining your own global variables.
95+
* **`presets.py`:** Defines named sets of suites (`enabled_suites()`) used by the `--preset` argument.

devops/scripts/benchmarks/benches/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,5 +163,5 @@ def name(self) -> str:
163163
def setup(self):
164164
return
165165

166-
def additionalMetadata(self) -> dict[str, BenchmarkMetadata]:
166+
def additional_metadata(self) -> dict[str, BenchmarkMetadata]:
167167
return {}

devops/scripts/benchmarks/benches/compute.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def setup(self):
8888

8989
self.built = True
9090

91-
def additionalMetadata(self) -> dict[str, BenchmarkMetadata]:
91+
def additional_metadata(self) -> dict[str, BenchmarkMetadata]:
9292
return {
9393
"SubmitKernel": BenchmarkMetadata(
9494
type="group",

devops/scripts/benchmarks/benches/test.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ def benchmarks(self) -> list[Benchmark]:
4545

4646
return result
4747

48-
def additionalMetadata(self) -> dict[str, BenchmarkMetadata]:
48+
def additional_metadata(self) -> dict[str, BenchmarkMetadata]:
4949
return {
5050
"Foo Group": BenchmarkMetadata(
5151
type="group",

devops/scripts/benchmarks/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ def collect_metadata(suites):
142142
metadata = {}
143143

144144
for s in suites:
145-
metadata.update(s.additionalMetadata())
145+
metadata.update(s.additional_metadata())
146146
suite_benchmarks = s.benchmarks()
147147
for benchmark in suite_benchmarks:
148148
metadata[benchmark.name()] = benchmark.get_metadata()

0 commit comments

Comments
 (0)