Skip to content

Commit fcb39a6

Browse files
committed
test(ab): do not print dimension that are the same across all metrics
When an A/B-Test fails, it prints all dimensions associated with the metric that changed. However, if some dimension is the same across literally all metrics emitted (for example, instance name and host kernel version will never change in the middle of a test run), then that's arguably just noise, and makes it hard to parse potentially interesting dimensions. So avoid printing all dimensions that are literally the same across all metrics. Note that this does _not_ mean for that example if cpu_utilization only changes to read throughput that the "read vs write" dimension won't be printed anymore. We only drop dimensions if the are the same across _all_ metrics, regardless of whether they had a statistically significant change. In this scenario, the "mode: write" metric still exists, it simply didn't change, and so the "mode: read" line won't be dropped from the output. Before: [Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or -4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002. This means that observing a change of this magnitude or worse, assuming that performance characteristics did not change across the tested commits, has a probability of 0.02%. Tested Dimensions: { "cpu_model": "AMD EPYC 7R13 48-Core Processor", "fio_block_size": "4096", "fio_mode": "randrw", "guest_kernel": "linux-6.1", "guest_memory": "1024.0MB", "host_kernel": "linux-6.8", "instance": "m6a.metal", "io_engine": "Sync", "performance_test": "test_block_latency", "rootfs": "ubuntu-24.04.squashfs", "vcpus": "2" } After: [Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or -4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002. This means that observing a change of this magnitude or worse, assuming that performance characteristics did not change across the tested commits, has a probability of 0.02%. Tested Dimensions: { "guest_kernel": "linux-6.1", "io_engine": "Sync", "vcpus": "2" } Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
1 parent be101d0 commit fcb39a6

File tree

1 file changed

+23
-1
lines changed

1 file changed

+23
-1
lines changed

tools/ab_test.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,8 @@ def load_data_series(report_path: Path, tag=None, *, reemit: bool = False):
114114
# Dictionary mapping EMF dimensions to A/B-testable metrics/properties
115115
processed_emf = {}
116116

117+
distinct_values_per_dimenson = defaultdict(set)
118+
117119
report = json.loads(report_path.read_text("UTF-8"))
118120
for test in report["tests"]:
119121
for line in test["teardown"]["stdout"].splitlines():
@@ -133,6 +135,9 @@ def load_data_series(report_path: Path, tag=None, *, reemit: bool = False):
133135
if not dimensions:
134136
continue
135137

138+
for dimension, value in dimensions.items():
139+
distinct_values_per_dimenson[dimension].add(value)
140+
136141
dimension_set = frozenset(dimensions.items())
137142

138143
if dimension_set not in processed_emf:
@@ -149,7 +154,24 @@ def load_data_series(report_path: Path, tag=None, *, reemit: bool = False):
149154

150155
values.extend(result[metric][0])
151156

152-
return processed_emf
157+
irrelevant_dimensions = set()
158+
159+
for dimension, distinct_values in distinct_values_per_dimenson.items():
160+
if len(distinct_values) == 1:
161+
irrelevant_dimensions.add(dimension)
162+
163+
post_processed_emf = {}
164+
165+
for dimension_set, metrics in processed_emf.items():
166+
processed_key = frozenset(
167+
(dim, value)
168+
for (dim, value) in dimension_set
169+
if dim not in irrelevant_dimensions
170+
)
171+
172+
post_processed_emf[processed_key] = metrics
173+
174+
return post_processed_emf
153175

154176

155177
def collect_data(binary_dir: Path, tests: list[str]):

0 commit comments

Comments
 (0)