Skip to content

Commit 5f35aaf

Browse files
committed
add sanitizers, debugging process, many cleanups, examples
1 parent 54e4cdd commit 5f35aaf

File tree

2 files changed

+161
-41
lines changed

2 files changed

+161
-41
lines changed

content/articles/optimal_debugging.smd

Lines changed: 66 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ practice having no ABI, but reality is in this text simplified for brevity and
1717
sanity.
1818

1919
- 1.[Theory of debugging](#theory)
20-
- 2.[Practical methods with tradeoffs](#practice)
20+
- 2.[Practical methods with trade-offs](#practice)
2121
- 3.[Uniform execution representation](#uniform_execution_representation)
2222
- 4.[Abstraction problems during problem isolation](#abstraction_problems)
2323
- 5.[Possible implementations](#possible_implementations)
@@ -35,34 +35,41 @@ on a specific program run. If the execution witness shows a "bad state",
3535
then there must be a bug.
3636
Thus a **debugger** can be seen **as query engine over states and transitions of
3737
a buggy execution witness.**
38+
In more simple terms, **debugging is not making bugs or removing them**.
3839
Frequent operations are bug source isolation to deterministic components,
3940
where encapsulation of non-determinism usually simplifies the process.
4041
In contrast to that, concurrent code is tricky to debug, because one
4142
needs to trace multiple execution flows to estimate where the origin of the
4243
incorrect state is.
4344

45+
The process of debugging means to use static and dynamic program analysis
46+
and its automation and adaption to speed up bug (classes) elimination for the
47+
(classes of) target systems.
48+
4449
One can generally categorize methods into the following list (**asoul**)
45-
**a**utomate, **s**implify, **o**bserve, understand, learn)
50+
**a**utomate, **s**implify, **o**bserve, **u**nderstand, **l**earn)
4651
- **a**utomate the process to minimize errors/oversights during debugging,
4752
against probabilistic errors, document the process etc
4853
- **s**implify and isolate system components and changes over time
4954
- **o**bserve the system while running it to *trace state or state changes*
5055
- **u**nderstand the expected and actual code semantics to the degree necessary
5156
- **l**earn, extend and ensure how and which system invariants are satisfied
5257
necessary from *of the involved systems*,
53-
for example userspace processes, kernel, build system, compiler, source code, linker,
58+
for example user-space processes, kernel, build system, compiler, source code, linker,
5459
object code, assembly, hardware etc
5560

5661
with the fundamental constrains being (**feel**)
5762
- **f**inding out correct system components semantics
5863
- **ee**nsuring deterministic reproducibility of the problem
5964
- **l**imited time and effort
6065

61-
Common debugging methods to **feel a soul** with various tradeoffs from compile-time
62-
to runtime debugging and less to more run-time data collection are:
66+
Common static and dynamic program analysis methods to
67+
**run the system** to **feel a soul** for the purpose of eliminating the bug
68+
(classes) are:
69+
- **Specification** meaning to "compare/get/write the details".
6370
- **Formal Verification** as ahead or compile-time invariant resolving.
64-
- **Validation** as runtime invariant checks.
65-
- **Testing** as sample based runtime invariant checks.
71+
- **Validation** as runtime invariant checks. Sanitizers as compiler runtime checks are common tools.
72+
- **Testing** as sample based runtime invariant checks. Coverage based fuzzers are common tools.
6673
- **Stepping** via "classical debugger" to manipulate task execution
6774
context, manipulate memory optionally via source code location translation
6875
via REPL commands, graphically, scripting or (rarely) freely programmable.
@@ -73,12 +80,20 @@ to runtime debugging and less to more run-time data collection are:
7380
- **Recording** Encoded dumping of runtime to replay runtime with
7481
before specified time and state determinism.
7582

76-
Simplification and isolation means to apply the meaning of both words on
77-
all potential sub-components including, but not limited to
78-
hardware, code versioning including dependencies, source system,
79-
compiler framework and target system. Typical methods are
80-
- **Bisection** via git or the actual binaries
81-
- **Reduction** via rmeoval of system parts or trying to reproduce with
83+
The core ideas for **what software system to run** based on code with its
84+
semantics are then typically a mix of
85+
- **Machine code** execution on the actual hardware to get hardware and timing behavior.
86+
- **Simulation** as **partial or full execution** on a simplified, imitative
87+
representation of the target hardware to get information for the simplified model.
88+
- **Virtualisation** as **isolation or simplification** of a hardware- or software
89+
subsystem to reduce system complexity.
90+
91+
Isolation and simplification are typically applied on all potential
92+
sub-components including, but not limited to hardware, code versioning
93+
including dependencies, source system, compiler framework and target system.
94+
Typical methods are
95+
- **Bisection** via git or the actual binaries.
96+
- **Reduction** via removal of system parts or trying to reproduce with
8297
(a minimal) example.
8398
- **Statistical analysis** from collected data on how the problem
8499
manifests on given environment(s) etc.
@@ -87,18 +102,21 @@ compiler framework and target system. Typical methods are
87102
of **the to be debugged system to provide necessary debug functionality**.
88103
For example, software based hardware debugging relies on interfaces to
89104
the hardware like JTAG, Kernel debugging on Kernel compilation or
90-
configuration and elevated (user), userspace debugging on process and
105+
configuration and elevated (user), user-space debugging on process and
91106
user permissions, system configuration or a child process to be debugged
92-
on Posix systems via ptrace.
107+
on Posix systems via `ptrace`.
108+
109+
It depends on many factors, for example bug classes and target systems, to what degree the process of
110+
debugging can and should be automated or optimized.
93111

94112
[]($section.id("practice"))
95113
### Practical methods with tradeoffs
96114

97115
Usually semantics are not "set into stone" inclusive or do not offer
98116
sufficient tradeoffs, so formal verification is rarely an option aside of
99-
usage of models as design and planning tool.
117+
usage of models as design and planning tool or for fail-safe program functionality.
100118
Depending on the domain and environment, problematic behavior of hardware
101-
or software components must be to be more or less 1. avoided and 2. traceable
119+
or software components must be more or less 1. avoided and 2. traceable
102120
and there exist various (domain) metrics as decision helper.
103121
Very well designed systems explain users how to debug bugs regarding to
104122
**functional behavior**, **time behavior** with **internal and
@@ -107,12 +125,41 @@ task execution correctness is intended.
107125
Access restrictions limit or rule out stepping, whereas storage limitations
108126
limit or rule out logging, tracing and recording.
109127

128+
**Sanitizers** are the most efficient and simplest debugging tools for C and C++,
129+
whereas Zig implements them, besides thread sanitizer, as allocator and safety mode.
130+
Instrumented sanitizers have a 2x-4x slowdown vs dynamic ones with 20x-50x slowdown.
131+
132+
Nr | Clang usage | Zig usage | Memory | Runtime | Comments |
133+
-- | ---------------------------- | ----------------- | ---------------- | -------- | ----------------------------------- |
134+
1 | -fsanitize=address | alloc + safety | 1x (3x stack) | 2x | Clang 16+ TB of virt mem |
135+
2 | -fsanitize=leak | allocator | 1x | 1x | on exit ?x? more mem+time |
136+
3 | -fsanitize=memory | unimplemented | 2-3x | 3x | |
137+
4 | -fsanitize=thread | -fsanitize=thread | 5-10x+1MB/thread | 5-15x | Clang ?x? ("lots of") virt mem |
138+
5 | -fsanitize=type | unimplemented | ? | ? | not enough data |
139+
6 | -fsanitize=undefined | safety mode | 1x | ~1x | |
140+
7 | -fsanitize=dataflow | unimplemented | 1-2x? | 1-4x? | wip, get variable dependencies |
141+
8 | -fsanitize=memtag | unimplemented | ~1.0Yx? | ~1.0Yx? | wip, address cheri-like ptr tagging |
142+
9 | -fsanitize=cfi | unimplemented | 1x | ~1x | forward edge ctrl flow protection |
143+
10 | -fsanitize=safe-stack | unimplemented | 1x | ~1x | backward edge ctrl flow protection |
144+
11 | -fsanitize=shadow-call-stack | unimplemented | 1x | ~1x | backward edge ctrl flow protection |
145+
146+
Sanitizers 1-6 are recommended for testing purpose and 7-11 for production by LLVM.
147+
Memory and slowdown numbers are only reported for LLVM sanitizers. Zig does not
148+
report own numbers yet (2025-01-11). Slowdown for dynamic sanitizer versions
149+
increases by a factor of 10x in contrast to the listed static usage costs.
150+
The leak sanitizer does only check for memory leaks, not other system resources.
151+
Besides various Kernel specific tools to track system resources,
152+
Valgrind can be used on Posix systems for non-memory resources and
153+
Application Verifier for Windows.
154+
Address and thread sanitizers can not be combined in Clang and combined usage
155+
of the Zig implementation is limited by virtual memory usage.
156+
In Zig, aliasing can currently not be sanitized against, whereas in Clang only
157+
typed based aliasing can be sanitized without any numbers reported by LLVM yet.
158+
110159
[TODO: requirements on system design for formal verification vs debugging.]::
111160
[no surprise rule: core system enabling debugging (in any form) must be correct]::
112161
[to the degree necessary.]::
113-
114162
[TODO: good argumentation on ignoring linker speak, language footguns etc.]::
115-
116163
[1.Bugs related to functional behavior.]::
117164
[2.Bugs related to time behavior.]::
118165
[3.Internal and external system resources.]::

0 commit comments

Comments
 (0)