Skip to content

Commit 7f0663e

Browse files
k-sareenwks
andauthored
Mention revoking TLABs of all mutators after a GC (#1018)
Co-authored-by: Kunshan Wang <wks1986@gmail.com>
1 parent d253d97 commit 7f0663e

File tree

2 files changed

+54
-36
lines changed

2 files changed

+54
-36
lines changed

docs/userguide/src/portingguide/perf_tuning/alloc.md

Lines changed: 53 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,31 @@
33
MMTk provides [`alloc()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.alloc.html)
44
and [`post_alloc()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.post_alloc.html), to allocate a piece of memory, and
55
finalize the memory as an object. Calling them is sufficient for a functional implementation, and we recommend doing
6-
so in the early development of an MMTk integration. However, as allocation is performance critical, runtimes generally would
7-
optimize to make allocation as fast as possible, in which invoking `alloc()` and `post_alloc()` becomes inadequent.
6+
so in the early development of an MMTk integration. However, as allocation is performance critical, runtimes generally would want to
7+
optimize allocation to make it as fast as possible, in which invoking `alloc()` and `post_alloc()` becomes inadequate.
88

99
The following discusses a few design decisions and optimizations related to allocation. The discussion mainly focuses on `alloc()`.
1010
`post_alloc()` works in a similar way, and the discussion can also be applied to `post_alloc()`.
11-
For conrete examples, you can refer to any of our supported bindings, and check the implementation in the bindings.
11+
For concrete examples, you can refer to any of our supported bindings, and check the implementation in the bindings.
1212

13-
Note that some of the optimizations need to make assumptions about the MMTk's internal implementation and may make the code less maintainable.
13+
> **Note:** Some of the optimizations need to make assumptions about MMTk's internal implementation and may make the code less maintainable.
1414
We recommend adding assertions in the binding code to make sure the assumptions are not broken across versions.
1515

1616
## Efficient access to MMTk mutators
1717

18-
An MMTk mutator context (created by [`bind_mutator()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.bind_mutator.html)) is a thread local data structure
18+
An MMTk mutator context (created by [`bind_mutator()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.bind_mutator.html)) is a thread-local data structure
1919
of type [`Mutator`](https://docs.mmtk.io/api/mmtk/plan/struct.Mutator.html).
20-
MMTk expects the binding to provide efficient access to the mutator structure in their thread local storage (TLS).
20+
MMTk expects the binding to provide efficient access to the mutator structure in their thread-local storage (TLS).
2121
Usually one of the following approaches is used to store MMTk mutators.
2222

2323
### Option 1: Storing the pointer
2424

2525
The `Box<Mutator<VM>>` returned from `mmtk::memory_manager::bind_mutator` is actually a pointer to
2626
a `Mutator<VM>` instance allocated in the Rust heap. It is simple to store it in the TLS.
27-
This approach does not make any assumption about the intenral of a MMTk `Mutator`. However, it requires an extra pointer dereference
28-
whene accessing a value in the mutator. This may sound not that bad. However, this degrades the performance of
29-
a carefully implemented inlined fastpath allocation sequence which is normally just a few instructions.
30-
This approach could be a simple start in the early development, but we do not recommend it for an efficient implementation.
27+
This approach does not make any assumption about the internals of an MMTk `Mutator`. However, it requires an extra pointer dereference
28+
when accessing a value in the mutator. This may sound not too bad, however, this degrades the performance of
29+
a carefully implemented inlined fast-path allocation sequence which is normally just a few (assembly) instructions.
30+
This approach could be a simple start in early development, but we do not recommend it for an efficient implementation.
3131

3232
If the VM is not implemented in Rust,
3333
the binding needs to turn the boxed pointer into a raw pointer before storing it.
@@ -47,37 +47,55 @@ have an assertion to ensure that the native type has the exact same layout as th
4747
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_embed_mutator_struct}}
4848
```
4949

50-
### Option 3: Embed the fastpath struct
50+
### Option 3: Embed the fast-path struct
5151

52-
The size of `Mutator` is a few hundreds of bytes, which could be considered as too large for TLS in some langauge implementations.
52+
The size of `Mutator` is a few hundreds of bytes, which could be considered too large to store in the TLS in some language implementations.
5353
Embedding `Mutator` also requires to duplicate a native type for the `Mutator` struct if the implementation language is not Rust.
54-
Sometimes it is undesirable to embed the `Mutator` type. One can choose only embed the fastpath struct that is in use.
54+
Sometimes it is undesirable to embed the `Mutator` type. One can choose to only embed the fast-path struct that is in use.
5555

56-
Unlike the `Mutator` type, the fastpath struct has a C-compatible layout, and it is simple and primitive enough
56+
Unlike the `Mutator` type, the fast-path struct has a C-compatible layout, and it is simple and primitive enough
5757
so it is unlikely to change. For example, MMTk provides [`BumpPointer`](https://docs.mmtk.io/api/mmtk/util/alloc/struct.BumpPointer.html),
5858
which simply includes a `cursor` and a `limit`.
5959

6060
In the following example, we embed one `BumpPointer` struct in the TLS.
61-
The `BumpPointer` is used in the fast path, and carefully synchronized with the allocator in the `Mutator` struct in the slow path.
61+
The `BumpPointer` is used in the fast-path, and carefully synchronized with the allocator in the `Mutator` struct in the slow-path. We also need to revoke (i.e. reset) all the cached `BumpPointer` values for *all* mutators if a GC occurs. Currently, we recommend implementing this in the [`resume_mutators`](https://docs.mmtk.io/api/mmtk/vm/trait.Collection.html#tymethod.resume_mutators) API call, however there is work in progress that would make it [an explicit API call instead](https://github.com/mmtk/mmtk-core/issues/1017).
62+
6263
Note that the `allocate_default` closure in the example below assumes the allocation semantics is `AllocationSemantics::Default`
6364
and its selected allocator uses bump-pointer allocation.
6465
Real-world fast-path implementations for high-performance VMs are usually JIT-compiled, inlined, and specialized for the current plan
65-
and allocation site, so that the allocation semantics of the concrete allocation site (and therefore the selected allocator) is known to the JIT compiler.
66+
and allocation site. Hence, the allocation semantics of the concrete allocation site (and therefore the selected allocator) is known to the JIT compiler.
6667

6768
For the sake of simplicity, we only store _one_ `BumpPointer` in the TLS in the example.
6869
In MMTk, each plan has multiple allocators, and the allocation semantics are mapped
69-
to those allocator by the GC plan you choose. So a plan use multiple allocators, and
70+
to those allocator by the GC plan you choose. So a plan uses multiple allocators, and
7071
depending on how many allocation semantics are used by a binding, the binding may use multiple allocators as well.
71-
In practice, a binding may embed multiple fastpath structs as the example for those allocators if they would like
72+
In practice, a binding may embed multiple fast-path structs for all the allocators they use if they would like
7273
more efficient allocation.
7374

74-
Also for simpliticy, the example assumes the default allocator for the plan in use is a bump pointer allocator.
75+
Also for simplicity, the example assumes the default allocator for the plan in use is a bump pointer allocator.
7576
Many plans in MMTk use bump pointer allocator for their default allocation semantics (`AllocationSemantics::Default`),
7677
which includes (but not limited to) `NoGC`, `SemiSpace`, `Immix`, generational plans, etc.
77-
If a plan does not do bump-pointer allocation, we may still implement fast paths, but we need to embed different data structures instead of `BumpPointer`.
78+
If a plan does not do bump-pointer allocation, we may still implement fast-paths, but we need to embed different data structures instead of `BumpPointer`.
79+
80+
```rust
81+
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_embed_fast-path_struct}}
82+
```
7883

84+
And pseudo-code for how you would reset the `BumpPointer`s for all mutators in `resume_mutators`. Note that these mutators are the runtime's actual mutator threads (i.e. where the cached bump pointers are stored) and are different from MMTk's `Mutator` struct.
7985
```rust
80-
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_embed_fastpath_struct}}
86+
impl Collection<RtName> for RtNameCollection {
87+
...
88+
fn resume_mutators(tls: VMWorkerThread) {
89+
// Reset the cached bump pointers of each mutator (setting both cursor and limit to 0) after a GC to
90+
// ensure that the VM sees a cohesive state
91+
for mutator in mutators {
92+
mutator.storage.default_bump_pointer = BumpPointer::default();
93+
}
94+
// Actually resume all the mutators now
95+
...
96+
}
97+
...
98+
}
8199
```
82100

83101
## Avoid resolving the allocator at run time
@@ -88,17 +106,17 @@ The following is roughly what `alloc()` does internally.
88106
1. Resolving the allocator
89107
1. Find the `Allocator` for the required `AllocationSemantics`. It is defined by the plan in use.
90108
2. Dynamically dispatch the call to [`Allocator::alloc()`](https://docs.mmtk.io/api/mmtk/util/alloc/trait.Allocator.html#tymethod.alloc).
91-
2. `Allocator::alloc()` executes the allocation fast path.
92-
3. If the fastpath fails, it executes the allocation slow path [`Allocator::alloc_slow()`](https://docs.mmtk.io/api/mmtk/util/alloc/trait.Allocator.html#method.alloc_slow).
93-
4. The slow path will further attempt to allocate memory, and may trigger a GC.
109+
2. `Allocator::alloc()` executes the allocation fast-path.
110+
3. If the fast-path fails, it executes the allocation slow-path [`Allocator::alloc_slow()`](https://docs.mmtk.io/api/mmtk/util/alloc/trait.Allocator.html#method.alloc_slow).
111+
4. The slow-path will further attempt to allocate memory, and may trigger a GC.
94112

95113
Resolving to a specific allocator and doing dynamic dispatch is expensive for an allocation.
96-
With the build-time or JIT-time knowledge on the object that will be allocated, an MMTK binding can possibly skip the first step in the run time.
114+
With the build-time or JIT-time knowledge about the object that will be allocated, an MMTk binding can possibly skip the first step in the run time.
97115

98-
If you implement an efficient fastpath allocation in the binding side (like the Option 3 above, and generating allocation code in a JIT which will be discussed next),
99-
that naturally avoids this problem. If you do not want to implement the fastpath allocation, the following is another example of how to avoid resolving the allocator.
116+
If you implement an efficient fast-path allocation in the binding side (like the Option 3 above, and [generating allocation code in a JIT](#emitting-allocation-sequence-in-a-jit-compiler)),
117+
that naturally avoids this problem. If you do not want to implement the fast-path allocation, the following is another example of how to avoid resolving the allocator.
100118

101-
Once MMTK is initialized, a binding can get the memory offset for the default allocator, and save it somewhere. When we know an object should be allocated
119+
Once MMTk is initialized, a binding can get the memory offset for the default allocator, and save it somewhere. When we know an object should be allocated
102120
with the default allocation semantics, we can use the offset to get a reference to the actual allocator (with unsafe code), and allocate with the allocator.
103121

104122
```rust
@@ -107,14 +125,14 @@ with the default allocation semantics, we can use the offset to get a reference
107125

108126
## Emitting Allocation Sequence in a JIT Compiler
109127

110-
If the language has a JIT compiler, it is generally desirable to generate the code sequence for the allocation fast path, rather
111-
than simply emitting a call instruction to the allocation function. The optimizations we talked above are relevant as well: 1.
112-
the compiler needs to be able to access the mutator, and 2. the compiler needs to be able to resolve to a specific allocator at
128+
If the language has a JIT compiler, it is generally desirable to generate the code sequence for the allocation fast-path, rather
129+
than simply emitting a call instruction to the allocation function. The optimizations we talked above are relevant as well: (i)
130+
the compiler needs to be able to access the mutator, and (ii) the compiler needs to be able to resolve to a specific allocator at
113131
JIT time. The actual implementation highly depends on the compiler implementation.
114132

115133
The following are some examples from our bindings (at the time of writing):
116134
* OpenJDK:
117-
* <https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetAssembler_x86.cpp#L38>
118-
* <https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetC2.cpp#L45>
119-
* JikesRVM: <https://github.com/mmtk/mmtk-jikesrvm/blob/fbfb91adafd9e9b3f45bd6a4b32c845a5d48d20b/jikesrvm/rvm/src/org/jikesrvm/mm/mminterface/MMTkMutatorContext.java#L377>
120-
* Julia: <https://github.com/mmtk/julia/blob/5c406d9bb20d76e2298a6101f171cfac491f651c/src/llvm-final-gc-lowering.cpp#L267>
135+
* [Example 1 (C1 compiler)](https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetAssembler_x86.cpp#L38)
136+
* [Example 2 (C2 compiler)](https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetC2.cpp#L45)
137+
* [JikesRVM](https://github.com/mmtk/mmtk-jikesrvm/blob/fbfb91adafd9e9b3f45bd6a4b32c845a5d48d20b/jikesrvm/rvm/src/org/jikesrvm/mm/mminterface/MMTkMutatorContext.java#L377)
138+
* [Julia](https://github.com/mmtk/julia/blob/5c406d9bb20d76e2298a6101f171cfac491f651c/src/llvm-final-gc-lowering.cpp#L267)

vmbindings/dummyvm/src/tests/doc_mutator_storage.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ pub fn embed_fastpath_struct() {
117117

118118
// Allocate: this will fail in the fastpath, and will get an allocation buffer from the slowpath
119119
let addr1 = allocate_default(8);
120-
// Alloacte: this will allocate from the fastpath
120+
// Allocate: this will allocate from the fastpath
121121
let addr2 = allocate_default(8);
122122
// ANCHOR_END: mutator_storage_embed_fastpath_struct
123123

0 commit comments

Comments
 (0)