8
8
---
9
9
10
10
[]($section.id("intro"))
11
- This article is intended as overview of debugging techniques and motivation for
11
+ This article is intended as overview of software based debugging techniques and motivation for
12
12
uniform execution representation and setup to efficiently mix and match the
13
13
appropriate technique for system level debugging with focus on statically
14
14
optimizing compiler languages to keep complexity and scope limited.
15
15
The reader may notice that there are several documented deficits
16
16
across platforms and tooling on documentation or functionality, which will be improved.
17
17
The author accepts the irony of such statements by "C having no ABI"/many systems in
18
- practice having no ABI, but reality is in this text simplified for brevity and
19
- sanity.
18
+ practice having no stable or formally specified ABI, but reality is in this text simplified
19
+ for brevity and sanity.
20
20
21
- Section 1 (theory) feels complete, but are planned to be more dense to
22
- become an appropriate definition for bug, debugging and debugging process.
21
+ Section 1 (theory) feels complete aside of simulation and hard-/software replacement
22
+ techniques and are good first drafts for bug, debugging and debugging process.
23
23
Section 2 (practical) is tailored towards non micro Kernels, which are based
24
24
on process abstraction, but is currently missing content and scalability numbers
25
25
for tooling.
26
26
The idea is to provide understanding and numbers to estimate for system design,
27
27
1 if formal proof of correctness is feasible and on what parts,
28
- 2 problems and methods applicable for dynamic program analysis.
29
- Followup sections will be on speculative and more advanced ideas, which
30
- should be feasible based on numbers.
28
+ 2 problems and methods applicable for dynamic system analysis.
29
+ Section 3 (future) will be on speculative and more advanced ideas, which should
30
+ be feasible based on numbers. They are planned to be about how to design
31
+ systems for rewriting and debugging using formal methods, compilers and
32
+ code synthesis.
31
33
32
34
- 1.[Theory of debugging](#theory)
33
35
- 2.[Practical methods with trade-offs](#practice)
@@ -38,7 +40,7 @@ should be feasible based on numbers.
38
40
[]($section.id("theory"))
39
41
### Theory of debugging
40
42
41
- A program [can be represented as (often non-deterministic) state machine](https://gu.outerproduct.net/debug.html),
43
+ A (software) system [can be represented as (often non-deterministic) state machine](https://gu.outerproduct.net/debug.html),
42
44
such that a **bug** is a **bad transition rule** between those states.
43
45
It is usually assumed that the developer/user knows correct and incorrect
44
46
(bad) system states and the code represents a somewhat correct model of
@@ -55,7 +57,7 @@ In contrast to that, concurrent code is tricky to debug, because one
55
57
needs to trace multiple execution flows to estimate where the origin of the
56
58
incorrect state is.
57
59
58
- The process of debugging means to use static and dynamic program analysis
60
+ The process of debugging means to use static and dynamic (software) system analysis
59
61
and its automation and adaption to speed up bug (classes) elimination for the
60
62
(classes of) target systems.
61
63
@@ -75,11 +77,13 @@ with the fundamental constrains being [**f**inding, **ee**nsuring, **l**imited]
75
77
- **ee**nsuring deterministic reproducibility of the problem
76
78
- **l**imited time and effort
77
79
78
- Common static and dynamic program analysis methods to
80
+ Common static and dynamic (software) system analysis methods to
79
81
**run the system** to **feel a soul** for the purpose of eliminating the bug
80
82
(classes) are:
81
- - **Specification** meaning to "compare/get/write the details".
83
+ - **Specification** meaning to "compare/get/write the details", possibly formally, possibly for (software) system synthesis .
82
84
- **Formal Verification** as ahead or compile-time invariant resolving.
85
+ May be superflous by (software) system synthesis based on **Specification** or unfeasible
86
+ due to complexity or non-formal specification.
83
87
- **Validation** as runtime invariant checks. Sanitizers as compiler runtime checks are common tools.
84
88
- **Testing** as sample based runtime invariant checks. Coverage based fuzzers are common tools.
85
89
- **Stepping** via "classical debugger" to manipulate task execution
@@ -88,9 +92,19 @@ Common static and dynamic program analysis methods to
88
92
- **Logging** as dumping (a simplification of) state with context
89
93
from bugs (usually timestamps in production systems).
90
94
- **Tracing** as dumping (a simplification of) runtime behavior
91
- via temporal relations (usually timestamps).
95
+ via temporal relations (usually timestamps). Can be immediate or sampled.
92
96
- **Recording** Encoded dumping of runtime to replay runtime with
93
97
before specified time and state determinism.
98
+ - **Scheduling** meaning to do logical or time-relation based scheduling of
99
+ process or threads. Typical use cases are undo "thread fuzzing", rr "chaos
100
+ mode", using the kernel scheduler API or bounded model checking.
101
+ - **Reversal computing** meaning to reverse execute some code to (partial) reset
102
+ the system to a previous state without **Recording** and replaying.
103
+ Typically used in simulations and pure logic functionality of languages and
104
+ corresponds to applying some bijective function.
105
+ - **Time-reversal computing** to do **Reversal computing** with tracked time.
106
+ Mostly used in simulations, because (if used) source code to assembly
107
+ relation and (assembly) instruction time must be fixed and known.
94
108
95
109
The core ideas for **what software system to run** based on code with its
96
110
semantics are then typically a mix of
@@ -120,7 +134,7 @@ on Posix systems via `ptrace`.
120
134
121
135
Without costly hardware devices to trace and physical access to the computing unit
122
136
for exact recording of the system behavior including time information,
123
- dynamic program analysis (to run the system) requires trade-offs on what
137
+ dynamic (software) system analysis (to run the system) requires trade-offs on what
124
138
program parts and aspects to inspect and collect data from.
125
139
Therefore, it depends on many factors, for example bug classes and target
126
140
systems, to what degree the process of debugging can and should be automated or
@@ -129,21 +143,56 @@ optimized.
129
143
[]($section.id("practice"))
130
144
### Practical methods with trade-offs
131
145
132
- Usually semantics are not "set into stone" inclusive or do not offer
133
- sufficient trade-offs, so formal verification is rarely an option aside of
134
- usage of models as design and planning tool or for fail-safe program functionality.
135
146
Depending on the domain and environment, problematic behavior of hardware
136
- or software components must be more or less 1 avoided and 2 traceable
147
+ or software components must be ( more or less) 1 avoided or 2 traceable
137
148
and there exist various (domain) metrics as decision helper.
138
- Very well designed systems explain users how to debug bugs regarding to
149
+ Very well designed systems explain users how to debug regarding to
139
150
**functional behavior**, **time behavior** with **internal and
140
151
external system resources** up to the degree the system usage and
141
152
task execution correctness is intended.
142
153
Access restrictions limit or rule out stepping, whereas storage limitations
143
154
limit or rule out logging, tracing and recording.
144
155
145
- **Sanitizers** are the most efficient and simplest debugging tools for C and C++,
146
- whereas Zig implements them, besides thread sanitizer, as allocator and safety mode.
156
+ Formal methods, **Specification**, (software) system synthesis and **Formal Verification**
157
+
158
+ (Highly) safety-critical systems or hardware are typically created from formal **Specification**
159
+ by (software) system synthesis or, when (full) synthesis is unfeasible, implementations are formally verified.
160
+ To my knowledge no standards for (highly) security-critical systems exist,
161
+ which require formal **Specification** and **Formal Verification** or synthesis (2025-05-16).
162
+
163
+ For non safety- or security-critical or hardware (sub)systems, usually
164
+ semantics are not "set into stone", so **Formal Verification** or (software) system
165
+ synthesis is rarely an option.
166
+ Formal models and (semi-)formal specifications are however commonly used for
167
+ design, planning, testing, review and validation of fail-safe or core (software) system
168
+ functionality.
169
+
170
+ Typical used models for C, C++, Zig and compiler backends are
171
+ Integer Arithmetic, Modular Arithmetic, Saturation Arithmetic for integers and
172
+ Floating point arithmetic (with possible rough edge cases like signaling NaN propagation),
173
+ Fixed-Point Arithmetic for real numbers.
174
+ (Simplified) instances of Separation Logic may be used to model and check
175
+ pointers and resources, for example Safe Rust uses separation logic with
176
+ lifetime inference and user annotations based on strict aliasing of Unsafe Rust.
177
+
178
+ Typical relevant unsolved or incomplete models for compilers are
179
+ 1. hardware semantics, specifically around timing behavior and (if used) weak memory
180
+ 2. memory synchronization semantics for weak memory systems with ideas from
181
+ "Relaxed Memory Concurrency Re-executed" and suggested model looking promising
182
+ 3. SIMD with specifically floating point NaN propagation
183
+ 4. pointer semantics, specifically in object code (initialization), se- and deserialization,
184
+ construction, optimizations on pointers with arithmetic, tagging
185
+ 5. constant time code semantics, for example how to ensure data stays in L1, L2 cache
186
+ and operations have constant time
187
+ 6. ABI semantics, since specifications are not formal
188
+
189
+ and typical problems more related to platforms like Kernels are
190
+ 1. resource (tracking) semantics, for example how to track resources in a process group
191
+ 2. security semantics, for example how to model process group permissions.
192
+
193
+ For **Validation**, **Sanitizers** are typically used as the most efficient and simplest
194
+ debugging tools for C and C++, whereas Zig implements them, besides thread
195
+ sanitizer, as allocator and safety mode.
147
196
Instrumented sanitizers have a 2x-4x slowdown vs dynamic ones with 20x-50x slowdown.
148
197
149
198
Nr | Clang usage | Zig usage | Memory | Runtime | Comments |
@@ -176,19 +225,62 @@ typed based aliasing can be sanitized without any numbers reported by LLVM yet.
176
225
Besides adjusting source code semantics via 1 sanitizers, one can do 2 own dynamic
177
226
source code adjustments or use 3 tooling that use kernel APIs to trace and optionally
178
227
3.1 run-time check information or 3.2 run-time check kernel APIs and with underlying state.
179
- Kernels further may simplify access to information, for example the `proc` file
228
+ Kernels further may simplify access to information, for example the `proc` file
180
229
system simplifies access to process information.
181
230
182
- TODO list standard Kernel tracing tooling, focus on dtrace
183
- and drawback of no "works for all kernels" "trace processes"
231
+ **Testing** is very context and use-case dependent with
232
+ typical separations being between pure/impure, time-invariant/variant,
233
+ accurate/approximate, hardware/software (sub)system separation from simple
234
+ unit tests up to integration and end to end tests based on
235
+ statistical/probability analysis and system intuition on determinstic expected
236
+ behavior based on explicit or implicit requirements.
237
+ TODO tools, hardware, software, mixed hw/sw examples
238
+
239
+ **Stepping**
240
+ * TODO time costs, sync options, etc
241
+
242
+ **Logging**
243
+ * TODO
244
+
245
+ **Tracing**
246
+ * TODO
247
+ - [ ] "Debugging And Profiling .NET Core Apps on Linux"
248
+ - [ ] https://github.com/goldshtn/linux-tracing-workshop
249
+ - [ ] CPU sampling linux perf, bcc; win ETW; macos; macos instruments dtrace
250
+ - [ ] dynamic tracing linux perf, systemtap, bcc; win nothing; macos dtrace
251
+ - [ ] static tracing linux LTTng, win ETW, macos nothing
252
+ - [ ] dump gen linux core_pattern, gcore; win procdump, WER; macos kern.corefile, gcore
253
+ - [ ] dump analysis gdb,lldb; visual studio, windbg, gdb,lldb
254
+ - [ ] lwn.net Unifying kernel tracing
255
+ - [ ] https://github.com/goldshtn/linux-tracing-workshop
256
+ - [ ] babeltrace https://babeltrace.org/
257
+ - [ ] There are no "works for all kernels" and "trace specific (group of) processes" solutions,
258
+ - [ ] so one has to do specific queries to constrain what data should be collected.
259
+ - [ ] For low latency overhead analysis, dtrace or inspired systems like bpftrace,
260
+ - [ ] bcc and systemtap can be used.
261
+ - [ ] ETW allows complete user-space captures
262
+ - [ ] Most related solutions use dtrace or
263
+ - [ ] TODO
264
+ - [ ] * list standard Kernel tracing tooling,
265
+ - [ ] * focus on dtrace and drawback of no "works for all kernels" "trace processes"
266
+ - [ ] * standard tooling for checking traced information
267
+ - [ ] * Tracers: dtrace, bpftrace, bcc, systemtap, ETW, darwin/macos?, other posix tools?
268
+ - [ ] - TODO memory/runtime/latency overhead etc
269
+
270
+ **Recording**
271
+ * TODO requirements: eliminate non-deterministic choices for replaying, others
272
+
273
+ **Scheduling**
274
+ * TODO requirements: simplification methods, practicality
275
+
276
+ **Reversal computing**
277
+ * TODO how and when to write bijective code to simplify debugging
184
278
185
- TODO list standard Kernel tooling for tracing
186
- TODO 3.1 list standard tooling for checking traced information
279
+ **Time-reversal computing**
280
+ * TODO use cases
187
281
188
282
The following is a list of typical problems with simple solution tactics.
189
- For simplicity no virtual machine/emulator approaches are listed, since they
190
- also affect performance and run-time behavior leading (likely) to more complex
191
- dynamic program analysis.
283
+ To keep analysis simple, no virtual machine/emulator and simulation approaches are given.
192
284
193
285
[]($section.id("uniform_execution_representation"))
194
286
### Uniform execution representation
@@ -208,7 +300,7 @@ performance problems and logic problems.
208
300
provides resources.
209
301
Automatically tracking resource leaks requires Valgrind logic over all
210
302
memory operations, reduction requires (limited) kernel object tracing.
211
- Tracing platform solutions will always have trade-offs.
303
+ Tracing platform solutions will always have trade-offs.
212
304
Complete solution tracing user process and related kernel logic is only
213
305
available as dtrace with non-optimal performance.
214
306
0 commit comments