Skip to content

Commit 90a92f6

Browse files
committed
measureme/counters: add module-level documentation.
1 parent 4068b2b commit 90a92f6

File tree

2 files changed

+114
-0
lines changed

2 files changed

+114
-0
lines changed

measureme/src/counters.rs

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,97 @@
1+
//! Profiling counters and their implementation.
2+
//!
3+
//! # Available counters
4+
//!
5+
//! Name (for [`Counter::by_name()`]) | Counter | OSes | CPUs
6+
//! --------------------------------- | ------- | ---- | ----
7+
//! `wall-time` | [`WallTime`] | any | any
8+
//! `instructions:u` | [`Instructions`] | Linux | `x86_64`
9+
//! `instructions-minus-irqs:u` | [`InstructionsMinusIrqs`] | Linux | `x86_64`<br>- AMD (since K8)<br>- Intel (since Sandy Bridge)
10+
//! `instructions-minus-r0420:u` | [`InstructionsMinusRaw0420`] | Linux | `x86_64`<br>- AMD (Zen)
11+
//!
12+
//! *Note: `:u` suffixes for hardware performance counters come from the Linux `perf`
13+
//! tool, and indicate that the counter is only active while userspace code executes
14+
//! (i.e. it's paused while the kernel handles syscalls, interrupts, etc.).*
15+
//!
16+
//! # Limitations and caveats
17+
//!
18+
//! *Note: for more information, also see the GitHub PR which first implemented hardware
19+
//! performance counter support ([#143](https://github.com/rust-lang/measureme/pull/143)).*
20+
//!
21+
//! The hardware performance counters (i.e. all counters other than `wall-time`) are limited to:
22+
//! * nightly Rust (gated on `features = ["nightly"]`), for `asm!`
23+
//! * Linux, for out-of-the-box performance counter reads from userspace
24+
//! * other OSes could work through custom kernel extensions/drivers, in the future
25+
//! * `x86_64` CPUs, mostly due to lack of other available test hardware
26+
//! * new architectures would be easier to support (on Linux) than new OSes
27+
//! * easiest to add would be 32-bit `x86` (aka `i686`), which would reuse
28+
//! most of the `x86_64` CPU model detection logic
29+
//! * specific (newer) CPU models, for certain non-standard counters
30+
//! * e.g. `instructions-minus-irqs:u` requires a "hardware interrupts" (aka "IRQs")
31+
//! counter, which is implemented differently between vendors / models (if at all)
32+
//! * single-threaded programs (counters only work on the thread they were created on)
33+
//! * for profiling `rustc`, this means only "check mode" (`--emit=metadata`),
34+
//! is supported currently (`-Z no-llvm-threads` could also work)
35+
//! * unclear what the best approach for handling multiple threads would be
36+
//! * changing the API (e.g. to require per-thread profiler handles) could result
37+
//! in a more efficient implementation, but would also be less ergonomic
38+
//! * profiling data from multithreaded programs would be harder to use due to
39+
//! noise from synchronization mechanisms, non-deterministic work-stealing, etc.
40+
//!
41+
//! For ergonomic reasons, the public API doesn't vary based on `features` or target.
42+
//! Instead, attempting to create any unsupported counter will return `Err`, just
43+
//! like it does for any issue detected at runtime (e.g. incompatible CPU model).
44+
//!
45+
//! When counting instructions specifically, these factors will impact the profiling quality:
46+
//! * high-level non-determinism (e.g. user interactions, networking)
47+
//! * the ideal use-case is a mostly-deterministic program, e.g. a compiler like `rustc`
48+
//! * if I/O can be isolated to separate profiling events, and doesn't impact
49+
//! execution in a more subtle way (see below), the deterministic parts of
50+
//! the program can still be profiled with high accuracy
51+
//! * low-level non-determinism (e.g. ASLR, randomized `HashMap`s, thread scheduling)
52+
//! * ASLR ("Address Space Layout Randomization"), may be provided by the OS for
53+
//! security reasons, or accidentally caused through allocations that depend on
54+
//! random data (even as low-entropy as e.g. the base 10 length of a process ID)
55+
//! * on Linux ASLR can be disabled by running the process under `setarch -R`
56+
//! * this impacts `rustc` and LLVM, which rely on keying `HashMap`s by addresses
57+
//! (typically of interned data) as an optimization, and while non-determinstic
58+
//! outputs are considered bugs, the instructions executed can still vary a lot,
59+
//! even when the externally observable behavior is perfectly repeatable
60+
//! * `HashMap`s are involved in one more than one way:
61+
//! * both the executed instructions, and the shape of the allocations depend
62+
//! on both the hasher state and choice of keys (as the buckets are in
63+
//! a flat array indexed by some of the lower bits of the key hashes)
64+
//! * so every `HashMap` with keys being/containing addresses will amplify
65+
//! ASLR and ASLR-like effects, making the entire program more sensitive
66+
//! * the default hasher is randomized, and while `rustc` doesn't use it,
67+
//! proc macros can (and will), and it's harder to disable than Linux ASLR
68+
//! * `jemalloc` (the allocator used by `rustc`, at least in official releases)
69+
//! has a 10 second "purge timer", which can introduce an ASLR-like effect,
70+
//! unless disabled with `MALLOC_CONF=dirty_decay_ms:0,muzzy_decay_ms:0`
71+
//! * hardware flaws (whether in the design or implementation)
72+
//! * hardware interrupts ("IRQs") and exceptions (like page faults) cause
73+
//! overcounting (1 instruction per interrupt, possibly the `iret` from the
74+
//! kernel handler back to the interrupted userspace program)
75+
//! * this is the reason why `instructions-minus-irqs:u` should be preferred
76+
//! to `instructions:u`, where the former is available
77+
//! * there are system-wide options (e.g. `CONFIG_NO_HZ_FULL`) for removing
78+
//! some interrupts from the cores used for profiling, but they're not as
79+
//! complete of a solution, nor easy to set up in the first place
80+
//! * AMD Zen CPUs have a speculative execution feature (dubbed `SpecLockMap`),
81+
//! which can cause non-deterministic overcounting for instructions following
82+
//! an atomic instruction (such as found in heap allocators, or `measureme`)
83+
//! * this is automatically detected, with a `log` message pointing the user
84+
//! to [https://github.com/mozilla/rr/wiki/Zen] for guidance on how to
85+
//! disable `SpecLockMap` on their system (sadly requires root access)
86+
//!
87+
//! Even if some of the above caveats apply for some profiling setup, as long as
88+
//! the counters function, they can still be used, and compared with `wall-time`.
89+
//! Chances are, they will still have less variance, as everything that impacts
90+
//! instruction counts will also impact any time measurements.
91+
//!
92+
//! Also keep in mind that instruction counts do not properly reflect all kinds
93+
//! of workloads, e.g. SIMD throughput and cache locality are unaccounted for.
94+
195
use std::error::Error;
296
use std::time::Instant;
397

@@ -60,6 +154,9 @@ impl Counter {
60154
}
61155
}
62156

157+
/// "Monotonic clock" with nanosecond precision (using [`std::time::Instant`]).
158+
///
159+
/// Can be obtained with `Counter::by_name("wall-time")`.
63160
pub struct WallTime {
64161
start: Instant,
65162
}
@@ -79,6 +176,9 @@ impl WallTime {
79176
}
80177
}
81178

179+
/// "Instructions retired" hardware performance counter (userspace-only).
180+
///
181+
/// Can be obtained with `Counter::by_name("instructions:u")`.
82182
pub struct Instructions {
83183
instructions: hw::Counter,
84184
start: u64,
@@ -103,6 +203,9 @@ impl Instructions {
103203
}
104204
}
105205

206+
/// More accurate [`Instructions`] (subtracting hardware interrupt counts).
207+
///
208+
/// Can be obtained with `Counter::by_name("instructions-minus-irqs:u")`.
106209
pub struct InstructionsMinusIrqs {
107210
instructions: hw::Counter,
108211
irqs: hw::Counter,
@@ -132,6 +235,10 @@ impl InstructionsMinusIrqs {
132235
}
133236
}
134237

238+
/// (Experimental) Like [`InstructionsMinusIrqs`] (but using an undocumented `r0420:u` counter).
239+
///
240+
/// Can be obtained with `Counter::by_name("instructions-minus-r0420:u")`.
241+
//
135242
// HACK(eddyb) this is a variant of `instructions-minus-irqs:u`, where `r0420`
136243
// is subtracted, instead of the usual "hardware interrupts" (aka IRQs).
137244
// `r0420` is an undocumented counter on AMD Zen CPUs which appears to count

measureme/src/lib.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@
1212
//!
1313
//! To create a [`Profiler`], call the [`Profiler::new()`] function and provide a `Path` with
1414
//! the directory and file name for the trace files.
15+
//! Alternatively, call the [`Profiler::with_counter()`] function, to choose the [`Counter`]
16+
//! the profiler will use for events (whereas [`Profiler::new()`] defaults to `wall-time`).
17+
//!
18+
//! For more information on available counters, see the [`counters`] module documentation.
1519
//!
1620
//! To record an event, call the [`Profiler::record_instant_event()`] method, passing a few
1721
//! arguments:
@@ -28,12 +32,15 @@
2832
//! - [`Profiler::alloc_string()`]: allocates a string and returns the [`StringId`] that refers
2933
//! to it
3034
//!
35+
//! [`counters`]: counters
36+
//! [`Counter`]: counters::Counter
3137
//! [`Profiler`]: Profiler
3238
//! [`Profiler::alloc_string()`]: Profiler::alloc_string
3339
//! [`Profiler::alloc_string_with_reserved_id()`]: Profiler::alloc_string_with_reserved_id
3440
//! [`Profiler::new()`]: Profiler::new
3541
//! [`Profiler::record_event()`]: Profiler::record_event
3642
//! [`Profiler::start_recording_interval_event()`]: Profiler::start_recording_interval_event
43+
//! [`Profiler::with_counter()`]: Profiler::with_counter
3744
//! [`StringId`]: StringId
3845
#![allow(renamed_and_removed_lints)] // intra_doc_link_resolution_failure is renamed on nightly
3946
#![deny(warnings, intra_doc_link_resolution_failure)]

0 commit comments

Comments
 (0)