Skip to content

Commit 0328b05

Browse files
authored
Expose alloc_slow. Add a section in user guide about allocation optimization (#967)
This PR exposes `alloc_slow()` to the bindings, adds a few public methods to allow bindings to implement allocation efficiently without duplicating mmtk-core code, and adds a section in the user guide to discuss allocation optimization. The changes in this PR includes: 1. Expose `alloc_slow()` in `memory_manager`. 2. Add `Mutator::allocator()` to allow bindings to get a specific allocator from an allocator selector. Add `Mutator::allocator_impl()` to allow bindings to get a typed allocator from a selector. 3. Add `Mutator::get_allocator_base_offset()` to allow bindings to use a specific allocator without selector (for performance). 4. Add a section in the user guide about allocation optimization. Remove some unused `SUMMARY.md` in the user guide. 5. Add `Address::as_mut_ref()`. 6. Expose the field for the fastpath bump pointer in some allocators. Related discussion on Zulip: https://mmtk.zulipchat.com/#narrow/stream/262679-General/topic/Refilling.20BumpPointer.20using.20AllocatorInfo/near/394142997
1 parent 9a676e6 commit 0328b05

File tree

15 files changed

+534
-46
lines changed

15 files changed

+534
-46
lines changed

docs/userguide/src/SUMMARY.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@
3131
- [How to Undertake a Port](portingguide/howto/prefix.md)
3232
- [NoGC](portingguide/howto/nogc.md)
3333
- [Next Steps](portingguide/howto/next_steps.md)
34+
- [Performance Tuning](portingguide/perf_tuning/prefix.md)
35+
- [Link Time Optimization](portingguide/perf_tuning/lto.md)
36+
- [Optimizing Allocation](portingguide/perf_tuning/alloc.md)
3437

3538
-----------
3639

docs/userguide/src/portingguide/SUMMARY.md

Lines changed: 0 additions & 11 deletions
This file was deleted.
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Optimizing Allocation
2+
3+
MMTk provides [`alloc()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.alloc.html)
4+
and [`post_alloc()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.post_alloc.html), to allocate a piece of memory, and
5+
finalize the memory as an object. Calling them is sufficient for a functional implementation, and we recommend doing
6+
so in the early development of an MMTk integration. However, as allocation is performance critical, runtimes generally would
7+
optimize to make allocation as fast as possible, in which invoking `alloc()` and `post_alloc()` becomes inadequent.
8+
9+
The following discusses a few design decisions and optimizations related to allocation. The discussion mainly focuses on `alloc()`.
10+
`post_alloc()` works in a similar way, and the discussion can also be applied to `post_alloc()`.
11+
For conrete examples, you can refer to any of our supported bindings, and check the implementation in the bindings.
12+
13+
Note that some of the optimizations need to make assumptions about the MMTk's internal implementation and may make the code less maintainable.
14+
We recommend adding assertions in the binding code to make sure the assumptions are not broken across versions.
15+
16+
## Efficient access to MMTk mutators
17+
18+
An MMTk mutator context (created by [`bind_mutator()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.bind_mutator.html)) is a thread local data structure
19+
of type [`Mutator`](https://docs.mmtk.io/api/mmtk/plan/struct.Mutator.html).
20+
MMTk expects the binding to provide efficient access to the mutator structure in their thread local storage (TLS).
21+
Usually one of the following approaches is used to store MMTk mutators.
22+
23+
### Option 1: Storing the pointer
24+
25+
The `Box<Mutator<VM>>` returned from `mmtk::memory_manager::bind_mutator` is actually a pointer to
26+
a `Mutator<VM>` instance allocated in the Rust heap. It is simple to store it in the TLS.
27+
This approach does not make any assumption about the intenral of a MMTk `Mutator`. However, it requires an extra pointer dereference
28+
whene accessing a value in the mutator. This may sound not that bad. However, this degrades the performance of
29+
a carefully implemented inlined fastpath allocation sequence which is normally just a few instructions.
30+
This approach could be a simple start in the early development, but we do not recommend it for an efficient implementation.
31+
32+
If the VM is not implemented in Rust,
33+
the binding needs to turn the boxed pointer into a raw pointer before storing it.
34+
35+
```rust
36+
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_boxed_pointer}}
37+
```
38+
39+
### Option 2: Embed the `Mutator` struct
40+
41+
To remove the extra pointer dereference, the binding can embed the `Mutator` type into their TLS type. This saves the extra dereference.
42+
43+
If the implementation language is not Rust, the developer needs to create a type that has the same layout as `Mutator`. It is recommended to
44+
have an assertion to ensure that the native type has the exact same layout as the Rust type `Mutator`.
45+
46+
```rust
47+
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_embed_mutator_struct}}
48+
```
49+
50+
### Option 3: Embed the fastpath struct
51+
52+
The size of `Mutator` is a few hundreds of bytes, which could be considered as too large for TLS in some langauge implementations.
53+
Embedding `Mutator` also requires to duplicate a native type for the `Mutator` struct if the implementation language is not Rust.
54+
Sometimes it is undesirable to embed the `Mutator` type. One can choose only embed the fastpath struct that is in use.
55+
56+
Unlike the `Mutator` type, the fastpath struct has a C-compatible layout, and it is simple and primitive enough
57+
so it is unlikely to change. For example, MMTk provides [`BumpPointer`](https://docs.mmtk.io/api/mmtk/util/alloc/struct.BumpPointer.html),
58+
which simply includes a `cursor` and a `limit`.
59+
60+
In the following example, we embed one `BumpPointer` struct in the TLS.
61+
The `BumpPointer` is used in the fast path, and carefully synchronized with the allocator in the `Mutator` struct in the slow path.
62+
Note that the `allocate_default` closure in the example below assumes the allocation semantics is `AllocationSemantics::Default`
63+
and its selected allocator uses bump-pointer allocation.
64+
Real-world fast-path implementations for high-performance VMs are usually JIT-compiled, inlined, and specialized for the current plan
65+
and allocation site, so that the allocation semantics of the concrete allocation site (and therefore the selected allocator) is known to the JIT compiler.
66+
67+
For the sake of simplicity, we only store _one_ `BumpPointer` in the TLS in the example.
68+
In MMTk, each plan has multiple allocators, and the allocation semantics are mapped
69+
to those allocator by the GC plan you choose. So a plan use multiple allocators, and
70+
depending on how many allocation semantics are used by a binding, the binding may use multiple allocators as well.
71+
In practice, a binding may embed multiple fastpath structs as the example for those allocators if they would like
72+
more efficient allocation.
73+
74+
Also for simpliticy, the example assumes the default allocator for the plan in use is a bump pointer allocator.
75+
Many plans in MMTk use bump pointer allocator for their default allocation semantics (`AllocationSemantics::Default`),
76+
which includes (but not limited to) `NoGC`, `SemiSpace`, `Immix`, generational plans, etc.
77+
If a plan does not do bump-pointer allocation, we may still implement fast paths, but we need to embed different data structures instead of `BumpPointer`.
78+
79+
```rust
80+
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_embed_fastpath_struct}}
81+
```
82+
83+
## Avoid resolving the allocator at run time
84+
85+
For a simple and general API of `alloc()`, MMTk requires `AllocationSemantics` as an argument in an allocation request, and resolves it at run-time.
86+
The following is roughly what `alloc()` does internally.
87+
88+
1. Resolving the allocator
89+
1. Find the `Allocator` for the required `AllocationSemantics`. It is defined by the plan in use.
90+
2. Dynamically dispatch the call to [`Allocator::alloc()`](https://docs.mmtk.io/api/mmtk/util/alloc/trait.Allocator.html#tymethod.alloc).
91+
2. `Allocator::alloc()` executes the allocation fast path.
92+
3. If the fastpath fails, it executes the allocation slow path [`Allocator::alloc_slow()`](https://docs.mmtk.io/api/mmtk/util/alloc/trait.Allocator.html#method.alloc_slow).
93+
4. The slow path will further attempt to allocate memory, and may trigger a GC.
94+
95+
Resolving to a specific allocator and doing dynamic dispatch is expensive for an allocation.
96+
With the build-time or JIT-time knowledge on the object that will be allocated, an MMTK binding can possibly skip the first step in the run time.
97+
98+
If you implement an efficient fastpath allocation in the binding side (like the Option 3 above, and generating allocation code in a JIT which will be discussed next),
99+
that naturally avoids this problem. If you do not want to implement the fastpath allocation, the following is another example of how to avoid resolving the allocator.
100+
101+
Once MMTK is initialized, a binding can get the memory offset for the default allocator, and save it somewhere. When we know an object should be allocated
102+
with the default allocation semantics, we can use the offset to get a reference to the actual allocator (with unsafe code), and allocate with the allocator.
103+
104+
```rust
105+
{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_avoid_resolving_allocator.rs:avoid_resolving_allocator}}
106+
```
107+
108+
## Emitting Allocation Sequence in a JIT Compiler
109+
110+
If the language has a JIT compiler, it is generally desirable to generate the code sequence for the allocation fast path, rather
111+
than simply emitting a call instruction to the allocation function. The optimizations we talked above are relevant as well: 1.
112+
the compiler needs to be able to access the mutator, and 2. the compiler needs to be able to resolve to a specific allocator at
113+
JIT time. The actual implementation highly depends on the compiler implementation.
114+
115+
The following are some examples from our bindings (at the time of writing):
116+
* OpenJDK:
117+
* <https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetAssembler_x86.cpp#L38>
118+
* <https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetC2.cpp#L45>
119+
* JikesRVM: <https://github.com/mmtk/mmtk-jikesrvm/blob/fbfb91adafd9e9b3f45bd6a4b32c845a5d48d20b/jikesrvm/rvm/src/org/jikesrvm/mm/mminterface/MMTkMutatorContext.java#L377>
120+
* Julia: <https://github.com/mmtk/julia/blob/5c406d9bb20d76e2298a6101f171cfac491f651c/src/llvm-final-gc-lowering.cpp#L267>
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Performance Tuning for Bindings
2+
3+
In this section, we discuss how to achieve the best performance with MMTk in a binding implementation.
4+
MMTk is a high performance GC library. But there are some key points that need to be done correctly
5+
to achieve the optimal performance.

docs/userguide/src/tutorial/SUMMARY.md

Lines changed: 0 additions & 20 deletions
This file was deleted.

src/memory_manager.rs

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,27 @@ pub fn alloc<VM: VMBinding>(
175175
mutator.alloc(size, align, offset, semantics)
176176
}
177177

178+
/// Invoke the allocation slow path. This is only intended for use when a binding implements the fastpath on
179+
/// the binding side. When the binding handles fast path allocation and the fast path fails, it can use this
180+
/// method for slow path allocation. Calling before exhausting fast path allocaiton buffer will lead to bad
181+
/// performance.
182+
///
183+
/// Arguments:
184+
/// * `mutator`: The mutator to perform this allocation request.
185+
/// * `size`: The number of bytes required for the object.
186+
/// * `align`: Required alignment for the object.
187+
/// * `offset`: Offset associated with the alignment.
188+
/// * `semantics`: The allocation semantic required for the allocation.
189+
pub fn alloc_slow<VM: VMBinding>(
190+
mutator: &mut Mutator<VM>,
191+
size: usize,
192+
align: usize,
193+
offset: usize,
194+
semantics: AllocationSemantics,
195+
) -> Address {
196+
mutator.alloc_slow(size, align, offset, semantics)
197+
}
198+
178199
/// Perform post-allocation actions, usually initializing object metadata. For many allocators none are
179200
/// required. For performance reasons, a VM should implement the post alloc fast-path on their side
180201
/// rather than just calling this function.

src/plan/mutator_context.rs

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ use crate::plan::global::Plan;
55
use crate::plan::AllocationSemantics;
66
use crate::policy::space::Space;
77
use crate::util::alloc::allocators::{AllocatorSelector, Allocators};
8+
use crate::util::alloc::Allocator;
89
use crate::util::{Address, ObjectReference};
910
use crate::util::{VMMutatorThread, VMWorkerThread};
1011
use crate::vm::VMBinding;
@@ -118,6 +119,20 @@ impl<VM: VMBinding> MutatorContext<VM> for Mutator<VM> {
118119
.alloc(size, align, offset)
119120
}
120121

122+
fn alloc_slow(
123+
&mut self,
124+
size: usize,
125+
align: usize,
126+
offset: usize,
127+
allocator: AllocationSemantics,
128+
) -> Address {
129+
unsafe {
130+
self.allocators
131+
.get_allocator_mut(self.config.allocator_mapping[allocator])
132+
}
133+
.alloc_slow(size, align, offset)
134+
}
135+
121136
// Note that this method is slow, and we expect VM bindings that care about performance to implement allocation fastpath sequence in their bindings.
122137
fn post_alloc(
123138
&mut self,
@@ -169,6 +184,80 @@ impl<VM: VMBinding> Mutator<VM> {
169184
unsafe { self.allocators.get_allocator_mut(selector) }.on_mutator_destroy();
170185
}
171186
}
187+
188+
/// Get the allocator for the selector.
189+
///
190+
/// # Safety
191+
/// The selector needs to be valid, and points to an allocator that has been initialized.
192+
/// [`crate::memory_manager::get_allocator_mapping`] can be used to get a selector.
193+
pub unsafe fn allocator(&self, selector: AllocatorSelector) -> &dyn Allocator<VM> {
194+
self.allocators.get_allocator(selector)
195+
}
196+
197+
/// Get the mutable allocator for the selector.
198+
///
199+
/// # Safety
200+
/// The selector needs to be valid, and points to an allocator that has been initialized.
201+
/// [`crate::memory_manager::get_allocator_mapping`] can be used to get a selector.
202+
pub unsafe fn allocator_mut(&mut self, selector: AllocatorSelector) -> &mut dyn Allocator<VM> {
203+
self.allocators.get_allocator_mut(selector)
204+
}
205+
206+
/// Get the allocator of a concrete type for the selector.
207+
///
208+
/// # Safety
209+
/// The selector needs to be valid, and points to an allocator that has been initialized.
210+
/// [`crate::memory_manager::get_allocator_mapping`] can be used to get a selector.
211+
pub unsafe fn allocator_impl<T: Allocator<VM>>(&self, selector: AllocatorSelector) -> &T {
212+
self.allocators.get_typed_allocator(selector)
213+
}
214+
215+
/// Get the mutable allocator of a concrete type for the selector.
216+
///
217+
/// # Safety
218+
/// The selector needs to be valid, and points to an allocator that has been initialized.
219+
/// [`crate::memory_manager::get_allocator_mapping`] can be used to get a selector.
220+
pub unsafe fn allocator_impl_mut<T: Allocator<VM>>(
221+
&mut self,
222+
selector: AllocatorSelector,
223+
) -> &mut T {
224+
self.allocators.get_typed_allocator_mut(selector)
225+
}
226+
227+
/// Return the base offset from a mutator pointer to the allocator specified by the selector.
228+
pub fn get_allocator_base_offset(selector: AllocatorSelector) -> usize {
229+
use crate::util::alloc::*;
230+
use memoffset::offset_of;
231+
use std::mem::size_of;
232+
offset_of!(Mutator<VM>, allocators)
233+
+ match selector {
234+
AllocatorSelector::BumpPointer(index) => {
235+
offset_of!(Allocators<VM>, bump_pointer)
236+
+ size_of::<BumpAllocator<VM>>() * index as usize
237+
}
238+
AllocatorSelector::FreeList(index) => {
239+
offset_of!(Allocators<VM>, free_list)
240+
+ size_of::<FreeListAllocator<VM>>() * index as usize
241+
}
242+
AllocatorSelector::Immix(index) => {
243+
offset_of!(Allocators<VM>, immix)
244+
+ size_of::<ImmixAllocator<VM>>() * index as usize
245+
}
246+
AllocatorSelector::LargeObject(index) => {
247+
offset_of!(Allocators<VM>, large_object)
248+
+ size_of::<LargeObjectAllocator<VM>>() * index as usize
249+
}
250+
AllocatorSelector::Malloc(index) => {
251+
offset_of!(Allocators<VM>, malloc)
252+
+ size_of::<MallocAllocator<VM>>() * index as usize
253+
}
254+
AllocatorSelector::MarkCompact(index) => {
255+
offset_of!(Allocators<VM>, markcompact)
256+
+ size_of::<MarkCompactAllocator<VM>>() * index as usize
257+
}
258+
AllocatorSelector::None => panic!("Expect a valid AllocatorSelector, found None"),
259+
}
260+
}
172261
}
173262

174263
/// Each GC plan should provide their implementation of a MutatorContext. *Note that this trait is no longer needed as we removed
@@ -186,6 +275,13 @@ pub trait MutatorContext<VM: VMBinding>: Send + 'static {
186275
offset: usize,
187276
allocator: AllocationSemantics,
188277
) -> Address;
278+
fn alloc_slow(
279+
&mut self,
280+
size: usize,
281+
align: usize,
282+
offset: usize,
283+
allocator: AllocationSemantics,
284+
) -> Address;
189285
fn post_alloc(&mut self, refer: ObjectReference, bytes: usize, allocator: AllocationSemantics);
190286
fn flush_remembered_sets(&mut self) {
191287
self.barrier().flush();

src/util/address.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,14 @@ impl Address {
300300
&*self.to_mut_ptr()
301301
}
302302

303+
/// converts the Address to a mutable Rust reference
304+
///
305+
/// # Safety
306+
/// The caller must guarantee the address actually points to a Rust object.
307+
pub unsafe fn as_mut_ref<'a, T>(self) -> &'a mut T {
308+
&mut *self.to_mut_ptr()
309+
}
310+
303311
/// converts the Address to a pointer-sized integer
304312
pub const fn as_usize(self) -> usize {
305313
self.0

0 commit comments

Comments
 (0)