You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
test(ab): do not print dimension that are the same across all metrics
When an A/B-Test fails, it prints all dimensions associated with the
metric that changed. However, if some dimension is the same across
literally all metrics emitted (for example, instance name and host
kernel version will never change in the middle of a test run), then
that's arguably just noise, and makes it hard to parse potentially
interesting dimensions. So avoid printing all dimensions that are
literally the same across all metrics.
Note that this does _not_ mean for that example if cpu_utilization only
changes to read throughput that the "read vs write" dimension won't be
printed anymore. We only drop dimensions if the are the same across
_all_ metrics, regardless of whether they had a statistically
significant change. In this scenario, the "mode: write" metric still
exists, it simply didn't change, and so the "mode: read" line won't be
dropped from the output.
Before:
[Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or
-4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002.
This means that observing a change of this magnitude or worse, assuming
that performance characteristics did not change across the tested
commits, has a probability of 0.02%. Tested Dimensions:
{
"cpu_model": "AMD EPYC 7R13 48-Core Processor",
"fio_block_size": "4096",
"fio_mode": "randrw",
"guest_kernel": "linux-6.1",
"guest_memory": "1024.0MB",
"host_kernel": "linux-6.8",
"instance": "m6a.metal",
"io_engine": "Sync",
"performance_test": "test_block_latency",
"rootfs": "ubuntu-24.04.squashfs",
"vcpus": "2"
}
After:
[Firecracker A/B-Test Runner] A/B-testing shows a change of -2.07μs, or
-4.70%, (from 44.04μs to 41.98μs) for metric clat_read with p=0.0002.
This means that observing a change of this magnitude or worse, assuming
that performance characteristics did not change across the tested
commits, has a probability of 0.02%. Tested Dimensions:
{
"guest_kernel": "linux-6.1",
"io_engine": "Sync",
"vcpus": "2"
}
Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
0 commit comments