@@ -16,18 +16,19 @@ conventional graphs, an edge may originate from either another node or a *root*.
16
16
Each * node* represents an object in the heap.
17
17
18
18
Each * edge* represents an object reference from an object or a root. A * root* is a reference held
19
- in a slot directly accessible from [ mutators] [ mutator ] , including local variables, global variables,
19
+ in a slot directly accessible from [ mutators] , including local variables, global variables,
20
20
thread-local variables, and so on. A object can have many fields, and some fields may hold
21
21
references to objects, while others hold non-reference values.
22
22
23
23
An object is * reachable* if there is a path in the object graph from any root to the node of the
24
- object. Unreachable objects cannot be accessed by [ mutators] [ mutator ] . They are considered
24
+ object. Unreachable objects cannot be accessed by [ mutators] . They are considered
25
25
garbage, and can be reclaimed by the garbage collector.
26
26
27
- [ mutator ] : #mutator
28
-
29
27
## Mutator
30
28
29
+ [ mutator ] : #mutator
30
+ [ mutators ] : #mutator
31
+
31
32
TODO
32
33
33
34
## Emergency Collection
@@ -47,6 +48,153 @@ implementing memory-sensitive caches.
47
48
48
49
[ java-soft-ref ] : https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html
49
50
51
+ ## GC-safe Point
52
+
53
+ [ GC-safe point ] : #gc-safe-point
54
+ [ GC-safe points ] : #gc-safe-point
55
+
56
+ Also known as: * GC-point*
57
+
58
+ A * GC-safe point* is a place in the code executed by mutators where (stop-the-world) garbage
59
+ collection is allowed to happen. Concurrent GC can run concurrently with mutators, but still needs
60
+ to synchronize with mutators at GC-safe points. Regardless, the following statements must be true
61
+ when a mutator is at a GC-safe point.
62
+
63
+ - References held by a mutator can be identified. That include references in local variables,
64
+ thread-local variables, and so on. For compiled code, that include those in stack slots and
65
+ machine registers.
66
+ - The mutator cannot be in the middle of operations that must be * atomic with respect to GC* .
67
+ That includes [ write barriers] , [ address-based hashing] , etc.
68
+
69
+ ### Code With GC Semantics
70
+
71
+ Compilers (including ahead-of-time and just-in-time compilers) for programs with garbage collection
72
+ semantics (such as Java source code or bytecode) usually understand GC semantics, too, and can
73
+ generate [ yieldpoints] and [ stack maps] to assist GC.
74
+
75
+ In practice, such compilers only make certain places in a function GC-safe and only generate [ stack
76
+ maps] at those places, including but not limited to:
77
+
78
+ - [ yieldpoints]
79
+ - object allocation sites (may trigger GC)
80
+ - call sites to other functions where GC is allowed to happen inside
81
+
82
+ If we allow GC to happen at arbitrary PC, it will either force the compiler to generate [ stack maps]
83
+ at all PCs, or force the VM to use [ shadow stacks] or [ conservative stack scanning] , instead. It
84
+ will also break operations that must be * atomic with respect to GC* , such as [ write barrier] and
85
+ [ address-based hashing] .
86
+
87
+ ### Code Without GC Semantics
88
+
89
+ In contrast, for programs without GC semantics (e.g. programs written in C, C++, Rust, etc.), their
90
+ compilers (GCC, clang, rustc, ...) are agnostic to GC. But many VMs (such as OpenJDK, CRuby, Julia,
91
+ etc.) are implemented in such languages. We don't usually use the term "GC-safe point" for
92
+ functions written in C, C++, Rust, etc., but each VM has its own rules to determine whether GC can
93
+ happen within functions written in those languages.
94
+
95
+ Interpreters usually maintain local variables in dedicated stacks or frames data structures.
96
+ References in such structures are identified by traversing those stacks or frames, and GC is usually
97
+ allowed between bytecode instructions.
98
+
99
+ Some runtime functions implement operations tightly related to GC, and must be * atomic w.r.t. GC* .
100
+ For example, if a function initializes the type information in the header of an object, GC cannot
101
+ happen in the middle. Otherwise the GC will read a corrupted header and crash. Other examples
102
+ include functions that implement the write barrier [ slow path] and [ address-based hashing] . Such
103
+ functions cannot allocate objects, and cannot call any function that may trigger GC.
104
+
105
+ Some functions do not access the GC heap, or only access the heap in controlled ways (e.g. utilizing
106
+ [ object pinning] , or via safe APIs such as [ JNI] ). Some of such functions (such as wrappers for
107
+ blocking system calls including ` read ` and ` write ` ) are long-running. GC is usually safe when some
108
+ mutators are executing such functions. Compilers for languages with GC semantics usually make * call
109
+ sites* to such functions [ GC-safe points] , and generate [ stack maps] at those call sites. The
110
+ runtime usually transitions the state of the current mutator thread so that the GC knows it is in
111
+ such a function when requesting all mutators to stop at their next GC-safe points.
112
+
113
+ [ JNI ] : https://docs.oracle.com/en/java/javase/21/docs/specs/jni/index.html
114
+
115
+ ## Stack Map
116
+
117
+ [ stack map ] : #stack-map
118
+ [ stack maps ] : #stack-map
119
+
120
+ A * stack map* is a data structure that identifies stack slots and registers that may contain
121
+ references. Stack maps are essential for supporting [ precise stack scanning] .
122
+
123
+ ## Yieldpoint
124
+
125
+ [ yieldpoint ] : #yieldpoint
126
+ [ yieldpoints ] : #yieldpoint
127
+
128
+ Also known as: * GC-check point*
129
+
130
+ A * yieldpoint* is a point in a program where a mutator thread checks if it should yield from normal
131
+ execution in order to handle certain events, such as garbage collection, profiling, biased lock
132
+ revocation, etc.
133
+
134
+ Compilers of programs with GC semantics (e.g. Java source code and byte code) insert yieldpoints in
135
+ various places, such as function epilogues and loop back-edges. In this way, when GC is triggered
136
+ asynchronously by other threads, the current mutator can reach the next yieldpoint quickly and yield
137
+ for GC promptly. Compilers also generate [ stack maps] at yieldpoints to make them [ GC-safe points] .
138
+
139
+ Because some operations (such as [ write barrier] ) must be * atomic w.r.t. GC* , [ yieldpoints] must not
140
+ be inserted in the middle of such operations.
141
+
142
+ Read the paper [ * Stop and go: Understanding yieldpoint behavior* ] [ LWB+15 ] by Lin et al. for more
143
+ details.
144
+
145
+ [ LWB+15 ] : https://dl.acm.org/doi/10.1145/2754169.2754187
146
+
147
+ ## Address-based Hashing
148
+
149
+ [ address-based hashing ] : #address-based-hashing
150
+
151
+ * Address-based hashing* is a GC-assisted space-efficient high-performance method for implementing
152
+ identity hash code in copying GC.
153
+
154
+ Read the [ Address-based Hashing] ( portingguide/concerns/address-based-hashing.md ) chapter for more
155
+ details.
156
+
157
+ ## Precise Stack Scanning
158
+
159
+ [ precise stack scanning ] : #precise-stack-scanning
160
+
161
+ Also known as: * exact stack scanning*
162
+
163
+ TODO
164
+
165
+ ## Conservative Stack Scanning
166
+
167
+ [ conservative stack scanning ] : #conservative-stack-scanning
168
+
169
+ TODO
170
+
171
+ ## Shadow Stack
172
+
173
+ [ shadow stack ] : #shadow-stack
174
+ [ shadow stacks ] : #shadow-stack
175
+
176
+ TODO
177
+
178
+ ## Write Barrier
179
+
180
+ [ write barrier ] : #write-barrier
181
+ [ write barriers ] : #write-barrier
182
+
183
+ TODO
184
+
185
+ ## Fast Path and Slow Path
186
+
187
+ [ fast path ] : #fast-path-and-slow-path
188
+ [ slow path ] : #fast-path-and-slow-path
189
+
190
+ TODO
191
+
192
+ ## Object Pinning
193
+
194
+ [ object pinning ] : #object-pinning
195
+
196
+ TODO
197
+
50
198
<!--
51
199
vim: tw=100 ts=4 sw=4 sts=4 et
52
200
-->
0 commit comments