Skip to content

Commit 302f611

Browse files
authored
User Guide chapter for address-based hashing (#1294)
This PR adds a User Guide chapter for address-based hashing. We also extended the Glossary to introduce GC-safe points and related concepts.
1 parent 4d61b1b commit 302f611

File tree

4 files changed

+420
-5
lines changed

4 files changed

+420
-5
lines changed

docs/userguide/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
- [Optimizing Allocation](portingguide/perf_tuning/alloc.md)
4141
- [VM-specific Concerns](portingguide/concerns/prefix.md)
4242
- [Finalizers and Weak References](portingguide/concerns/weakref.md)
43+
- [Address-based Hashing](portingguide/concerns/address-based-hashing.md)
4344
- [API Migration Guide](migration/prefix.md)
4445
- [Template (for mmtk-core developers)](migration/template.md)
4546

docs/userguide/src/glossary.md

Lines changed: 152 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,19 @@ conventional graphs, an edge may originate from either another node or a *root*.
1616
Each *node* represents an object in the heap.
1717

1818
Each *edge* represents an object reference from an object or a root. A *root* is a reference held
19-
in a slot directly accessible from [mutators][mutator], including local variables, global variables,
19+
in a slot directly accessible from [mutators], including local variables, global variables,
2020
thread-local variables, and so on. A object can have many fields, and some fields may hold
2121
references to objects, while others hold non-reference values.
2222

2323
An object is *reachable* if there is a path in the object graph from any root to the node of the
24-
object. Unreachable objects cannot be accessed by [mutators][mutator]. They are considered
24+
object. Unreachable objects cannot be accessed by [mutators]. They are considered
2525
garbage, and can be reclaimed by the garbage collector.
2626

27-
[mutator]: #mutator
28-
2927
## Mutator
3028

29+
[mutator]: #mutator
30+
[mutators]: #mutator
31+
3132
TODO
3233

3334
## Emergency Collection
@@ -47,6 +48,153 @@ implementing memory-sensitive caches.
4748

4849
[java-soft-ref]: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html
4950

51+
## GC-safe Point
52+
53+
[GC-safe point]: #gc-safe-point
54+
[GC-safe points]: #gc-safe-point
55+
56+
Also known as: *GC-point*
57+
58+
A *GC-safe point* is a place in the code executed by mutators where (stop-the-world) garbage
59+
collection is allowed to happen. Concurrent GC can run concurrently with mutators, but still needs
60+
to synchronize with mutators at GC-safe points. Regardless, the following statements must be true
61+
when a mutator is at a GC-safe point.
62+
63+
- References held by a mutator can be identified. That include references in local variables,
64+
thread-local variables, and so on. For compiled code, that include those in stack slots and
65+
machine registers.
66+
- The mutator cannot be in the middle of operations that must be *atomic with respect to GC*.
67+
That includes [write barriers], [address-based hashing], etc.
68+
69+
### Code With GC Semantics
70+
71+
Compilers (including ahead-of-time and just-in-time compilers) for programs with garbage collection
72+
semantics (such as Java source code or bytecode) usually understand GC semantics, too, and can
73+
generate [yieldpoints] and [stack maps] to assist GC.
74+
75+
In practice, such compilers only make certain places in a function GC-safe and only generate [stack
76+
maps] at those places, including but not limited to:
77+
78+
- [yieldpoints]
79+
- object allocation sites (may trigger GC)
80+
- call sites to other functions where GC is allowed to happen inside
81+
82+
If we allow GC to happen at arbitrary PC, it will either force the compiler to generate [stack maps]
83+
at all PCs, or force the VM to use [shadow stacks] or [conservative stack scanning], instead. It
84+
will also break operations that must be *atomic with respect to GC*, such as [write barrier] and
85+
[address-based hashing].
86+
87+
### Code Without GC Semantics
88+
89+
In contrast, for programs without GC semantics (e.g. programs written in C, C++, Rust, etc.), their
90+
compilers (GCC, clang, rustc, ...) are agnostic to GC. But many VMs (such as OpenJDK, CRuby, Julia,
91+
etc.) are implemented in such languages. We don't usually use the term "GC-safe point" for
92+
functions written in C, C++, Rust, etc., but each VM has its own rules to determine whether GC can
93+
happen within functions written in those languages.
94+
95+
Interpreters usually maintain local variables in dedicated stacks or frames data structures.
96+
References in such structures are identified by traversing those stacks or frames, and GC is usually
97+
allowed between bytecode instructions.
98+
99+
Some runtime functions implement operations tightly related to GC, and must be *atomic w.r.t. GC*.
100+
For example, if a function initializes the type information in the header of an object, GC cannot
101+
happen in the middle. Otherwise the GC will read a corrupted header and crash. Other examples
102+
include functions that implement the write barrier [slow path] and [address-based hashing]. Such
103+
functions cannot allocate objects, and cannot call any function that may trigger GC.
104+
105+
Some functions do not access the GC heap, or only access the heap in controlled ways (e.g. utilizing
106+
[object pinning], or via safe APIs such as [JNI]). Some of such functions (such as wrappers for
107+
blocking system calls including `read` and `write`) are long-running. GC is usually safe when some
108+
mutators are executing such functions. Compilers for languages with GC semantics usually make *call
109+
sites* to such functions [GC-safe points], and generate [stack maps] at those call sites. The
110+
runtime usually transitions the state of the current mutator thread so that the GC knows it is in
111+
such a function when requesting all mutators to stop at their next GC-safe points.
112+
113+
[JNI]: https://docs.oracle.com/en/java/javase/21/docs/specs/jni/index.html
114+
115+
## Stack Map
116+
117+
[stack map]: #stack-map
118+
[stack maps]: #stack-map
119+
120+
A *stack map* is a data structure that identifies stack slots and registers that may contain
121+
references. Stack maps are essential for supporting [precise stack scanning].
122+
123+
## Yieldpoint
124+
125+
[yieldpoint]: #yieldpoint
126+
[yieldpoints]: #yieldpoint
127+
128+
Also known as: *GC-check point*
129+
130+
A *yieldpoint* is a point in a program where a mutator thread checks if it should yield from normal
131+
execution in order to handle certain events, such as garbage collection, profiling, biased lock
132+
revocation, etc.
133+
134+
Compilers of programs with GC semantics (e.g. Java source code and byte code) insert yieldpoints in
135+
various places, such as function epilogues and loop back-edges. In this way, when GC is triggered
136+
asynchronously by other threads, the current mutator can reach the next yieldpoint quickly and yield
137+
for GC promptly. Compilers also generate [stack maps] at yieldpoints to make them [GC-safe points].
138+
139+
Because some operations (such as [write barrier]) must be *atomic w.r.t. GC*, [yieldpoints] must not
140+
be inserted in the middle of such operations.
141+
142+
Read the paper [*Stop and go: Understanding yieldpoint behavior*][LWB+15] by Lin et al. for more
143+
details.
144+
145+
[LWB+15]: https://dl.acm.org/doi/10.1145/2754169.2754187
146+
147+
## Address-based Hashing
148+
149+
[address-based hashing]: #address-based-hashing
150+
151+
*Address-based hashing* is a GC-assisted space-efficient high-performance method for implementing
152+
identity hash code in copying GC.
153+
154+
Read the [Address-based Hashing](portingguide/concerns/address-based-hashing.md) chapter for more
155+
details.
156+
157+
## Precise Stack Scanning
158+
159+
[precise stack scanning]: #precise-stack-scanning
160+
161+
Also known as: *exact stack scanning*
162+
163+
TODO
164+
165+
## Conservative Stack Scanning
166+
167+
[conservative stack scanning]: #conservative-stack-scanning
168+
169+
TODO
170+
171+
## Shadow Stack
172+
173+
[shadow stack]: #shadow-stack
174+
[shadow stacks]: #shadow-stack
175+
176+
TODO
177+
178+
## Write Barrier
179+
180+
[write barrier]: #write-barrier
181+
[write barriers]: #write-barrier
182+
183+
TODO
184+
185+
## Fast Path and Slow Path
186+
187+
[fast path]: #fast-path-and-slow-path
188+
[slow path]: #fast-path-and-slow-path
189+
190+
TODO
191+
192+
## Object Pinning
193+
194+
[object pinning]: #object-pinning
195+
196+
TODO
197+
50198
<!--
51199
vim: tw=100 ts=4 sw=4 sts=4 et
52200
-->

0 commit comments

Comments
 (0)