|
| 1 | +# Optimizing Allocation |
| 2 | + |
| 3 | +MMTk provides [`alloc()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.alloc.html) |
| 4 | +and [`post_alloc()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.post_alloc.html), to allocate a piece of memory, and |
| 5 | +finalize the memory as an object. Calling them is sufficient for a functional implementation, and we recommend doing |
| 6 | +so in the early development of an MMTk integration. However, as allocation is performance critical, runtimes generally would |
| 7 | +optimize to make allocation as fast as possible, in which invoking `alloc()` and `post_alloc()` becomes inadequent. |
| 8 | + |
| 9 | +The following discusses a few design decisions and optimizations related to allocation. The discussion mainly focuses on `alloc()`. |
| 10 | +`post_alloc()` works in a similar way, and the discussion can also be applied to `post_alloc()`. |
| 11 | +For conrete examples, you can refer to any of our supported bindings, and check the implementation in the bindings. |
| 12 | + |
| 13 | +Note that some of the optimizations need to make assumptions about the MMTk's internal implementation and may make the code less maintainable. |
| 14 | +We recommend adding assertions in the binding code to make sure the assumptions are not broken across versions. |
| 15 | + |
| 16 | +## Efficient access to MMTk mutators |
| 17 | + |
| 18 | +An MMTk mutator context (created by [`bind_mutator()`](https://docs.mmtk.io/api/mmtk/memory_manager/fn.bind_mutator.html)) is a thread local data structure |
| 19 | +of type [`Mutator`](https://docs.mmtk.io/api/mmtk/plan/struct.Mutator.html). |
| 20 | +MMTk expects the binding to provide efficient access to the mutator structure in their thread local storage (TLS). |
| 21 | +Usually one of the following approaches is used to store MMTk mutators. |
| 22 | + |
| 23 | +### Option 1: Storing the pointer |
| 24 | + |
| 25 | +The `Box<Mutator<VM>>` returned from `mmtk::memory_manager::bind_mutator` is actually a pointer to |
| 26 | +a `Mutator<VM>` instance allocated in the Rust heap. It is simple to store it in the TLS. |
| 27 | +This approach does not make any assumption about the intenral of a MMTk `Mutator`. However, it requires an extra pointer dereference |
| 28 | +whene accessing a value in the mutator. This may sound not that bad. However, this degrades the performance of |
| 29 | +a carefully implemented inlined fastpath allocation sequence which is normally just a few instructions. |
| 30 | +This approach could be a simple start in the early development, but we do not recommend it for an efficient implementation. |
| 31 | + |
| 32 | +If the VM is not implemented in Rust, |
| 33 | +the binding needs to turn the boxed pointer into a raw pointer before storing it. |
| 34 | + |
| 35 | +```rust |
| 36 | +{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_boxed_pointer}} |
| 37 | +``` |
| 38 | + |
| 39 | +### Option 2: Embed the `Mutator` struct |
| 40 | + |
| 41 | +To remove the extra pointer dereference, the binding can embed the `Mutator` type into their TLS type. This saves the extra dereference. |
| 42 | + |
| 43 | +If the implementation language is not Rust, the developer needs to create a type that has the same layout as `Mutator`. It is recommended to |
| 44 | +have an assertion to ensure that the native type has the exact same layout as the Rust type `Mutator`. |
| 45 | + |
| 46 | +```rust |
| 47 | +{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_embed_mutator_struct}} |
| 48 | +``` |
| 49 | + |
| 50 | +### Option 3: Embed the fastpath struct |
| 51 | + |
| 52 | +The size of `Mutator` is a few hundreds of bytes, which could be considered as too large for TLS in some langauge implementations. |
| 53 | +Embedding `Mutator` also requires to duplicate a native type for the `Mutator` struct if the implementation language is not Rust. |
| 54 | +Sometimes it is undesirable to embed the `Mutator` type. One can choose only embed the fastpath struct that is in use. |
| 55 | + |
| 56 | +Unlike the `Mutator` type, the fastpath struct has a C-compatible layout, and it is simple and primitive enough |
| 57 | +so it is unlikely to change. For example, MMTk provides [`BumpPointer`](https://docs.mmtk.io/api/mmtk/util/alloc/struct.BumpPointer.html), |
| 58 | +which simply includes a `cursor` and a `limit`. |
| 59 | + |
| 60 | +In the following example, we embed one `BumpPointer` struct in the TLS. |
| 61 | +The `BumpPointer` is used in the fast path, and carefully synchronized with the allocator in the `Mutator` struct in the slow path. |
| 62 | +Note that the `allocate_default` closure in the example below assumes the allocation semantics is `AllocationSemantics::Default` |
| 63 | +and its selected allocator uses bump-pointer allocation. |
| 64 | +Real-world fast-path implementations for high-performance VMs are usually JIT-compiled, inlined, and specialized for the current plan |
| 65 | +and allocation site, so that the allocation semantics of the concrete allocation site (and therefore the selected allocator) is known to the JIT compiler. |
| 66 | + |
| 67 | +For the sake of simplicity, we only store _one_ `BumpPointer` in the TLS in the example. |
| 68 | +In MMTk, each plan has multiple allocators, and the allocation semantics are mapped |
| 69 | +to those allocator by the GC plan you choose. So a plan use multiple allocators, and |
| 70 | +depending on how many allocation semantics are used by a binding, the binding may use multiple allocators as well. |
| 71 | +In practice, a binding may embed multiple fastpath structs as the example for those allocators if they would like |
| 72 | +more efficient allocation. |
| 73 | + |
| 74 | +Also for simpliticy, the example assumes the default allocator for the plan in use is a bump pointer allocator. |
| 75 | +Many plans in MMTk use bump pointer allocator for their default allocation semantics (`AllocationSemantics::Default`), |
| 76 | +which includes (but not limited to) `NoGC`, `SemiSpace`, `Immix`, generational plans, etc. |
| 77 | +If a plan does not do bump-pointer allocation, we may still implement fast paths, but we need to embed different data structures instead of `BumpPointer`. |
| 78 | + |
| 79 | +```rust |
| 80 | +{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_mutator_storage.rs:mutator_storage_embed_fastpath_struct}} |
| 81 | +``` |
| 82 | + |
| 83 | +## Avoid resolving the allocator at run time |
| 84 | + |
| 85 | +For a simple and general API of `alloc()`, MMTk requires `AllocationSemantics` as an argument in an allocation request, and resolves it at run-time. |
| 86 | +The following is roughly what `alloc()` does internally. |
| 87 | + |
| 88 | +1. Resolving the allocator |
| 89 | + 1. Find the `Allocator` for the required `AllocationSemantics`. It is defined by the plan in use. |
| 90 | + 2. Dynamically dispatch the call to [`Allocator::alloc()`](https://docs.mmtk.io/api/mmtk/util/alloc/trait.Allocator.html#tymethod.alloc). |
| 91 | +2. `Allocator::alloc()` executes the allocation fast path. |
| 92 | +3. If the fastpath fails, it executes the allocation slow path [`Allocator::alloc_slow()`](https://docs.mmtk.io/api/mmtk/util/alloc/trait.Allocator.html#method.alloc_slow). |
| 93 | +4. The slow path will further attempt to allocate memory, and may trigger a GC. |
| 94 | + |
| 95 | +Resolving to a specific allocator and doing dynamic dispatch is expensive for an allocation. |
| 96 | +With the build-time or JIT-time knowledge on the object that will be allocated, an MMTK binding can possibly skip the first step in the run time. |
| 97 | + |
| 98 | +If you implement an efficient fastpath allocation in the binding side (like the Option 3 above, and generating allocation code in a JIT which will be discussed next), |
| 99 | +that naturally avoids this problem. If you do not want to implement the fastpath allocation, the following is another example of how to avoid resolving the allocator. |
| 100 | + |
| 101 | +Once MMTK is initialized, a binding can get the memory offset for the default allocator, and save it somewhere. When we know an object should be allocated |
| 102 | +with the default allocation semantics, we can use the offset to get a reference to the actual allocator (with unsafe code), and allocate with the allocator. |
| 103 | + |
| 104 | +```rust |
| 105 | +{{#include ../../../../../vmbindings/dummyvm/src/tests/doc_avoid_resolving_allocator.rs:avoid_resolving_allocator}} |
| 106 | +``` |
| 107 | + |
| 108 | +## Emitting Allocation Sequence in a JIT Compiler |
| 109 | + |
| 110 | +If the language has a JIT compiler, it is generally desirable to generate the code sequence for the allocation fast path, rather |
| 111 | +than simply emitting a call instruction to the allocation function. The optimizations we talked above are relevant as well: 1. |
| 112 | +the compiler needs to be able to access the mutator, and 2. the compiler needs to be able to resolve to a specific allocator at |
| 113 | +JIT time. The actual implementation highly depends on the compiler implementation. |
| 114 | + |
| 115 | +The following are some examples from our bindings (at the time of writing): |
| 116 | +* OpenJDK: |
| 117 | + * <https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetAssembler_x86.cpp#L38> |
| 118 | + * <https://github.com/mmtk/mmtk-openjdk/blob/9ab13ae3ac9c68c5f694cdd527a63ca909e27b15/openjdk/mmtkBarrierSetC2.cpp#L45> |
| 119 | +* JikesRVM: <https://github.com/mmtk/mmtk-jikesrvm/blob/fbfb91adafd9e9b3f45bd6a4b32c845a5d48d20b/jikesrvm/rvm/src/org/jikesrvm/mm/mminterface/MMTkMutatorContext.java#L377> |
| 120 | +* Julia: <https://github.com/mmtk/julia/blob/5c406d9bb20d76e2298a6101f171cfac491f651c/src/llvm-final-gc-lowering.cpp#L267> |
0 commit comments