|
1 | 1 | # Monomorphization
|
2 | 2 |
|
3 |
| -TODO |
| 3 | +As you probably know, rust has a very expressive type system that has extensive |
| 4 | +support for generic types. But of course, assembly is not generic, so we need |
| 5 | +to figure out the concrete types of all the generics before the code can |
| 6 | +execute. |
4 | 7 |
|
| 8 | +Different languages handle this problem differently. For example, in some |
| 9 | +languages, such as Java, we may not know the most precise type of value until |
| 10 | +runtime. In the case of Java, this is ok because (almost) all variables are |
| 11 | +reference values anyway (i.e. pointers to a stack allocated object). This |
| 12 | +flexibility comes at the cost of performance, since all accesses to an object |
| 13 | +must dereference a pointer. |
| 14 | + |
| 15 | +Rust takes a different approach: it _monomorphizes_ all generic types. This |
| 16 | +means that compiler stamps out a different copy of the code of a generic |
| 17 | +function for each concrete type needed. For example, if I use a `Vec<u64>` and |
| 18 | +a `Vec<String>` in my code, then the generated binary will have two copies of |
| 19 | +the generated code for `Vec`: one for `Vec<u64>` and another for `Vec<String>`. |
| 20 | +The result is fast programs, but it comes at the cost of compile time (creating |
| 21 | +all those copies can take a while) and binary size (all those copies might take |
| 22 | +a lot of space). |
| 23 | + |
| 24 | +Monomorphization is the first step in the backend of the rust compiler. |
| 25 | + |
| 26 | +## Collection |
| 27 | + |
| 28 | +First, we need to figure out what concrete types we need for all the generic |
| 29 | +things in our program. This is called _collection_, and the code that does this |
| 30 | +is called the _monomorphization collector_. |
| 31 | + |
| 32 | +Take this example: |
| 33 | + |
| 34 | +```rust |
| 35 | +fn banana() { |
| 36 | + peach::<u64>(); |
| 37 | +} |
| 38 | + |
| 39 | +fn main() { |
| 40 | + banana(); |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +The monomorphisation collector will give you a list of `[main, banana, |
| 45 | +peach::<u64>]`. These are the functions that will have machine code generated |
| 46 | +for them. Collector will also add things like statics to that list. |
| 47 | + |
| 48 | +See [the collector rustdocs][collect] for more info. |
| 49 | + |
| 50 | +[collect]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/monomorphize/collector/index.html |
5 | 51 |
|
6 | 52 | ## Polymorphization
|
7 | 53 |
|
8 |
| -TODO |
| 54 | +As mentioned above, monomorphisation produces fast code, but it comes at the |
| 55 | +cost of compile time and binary size. [MIR |
| 56 | +optimizations](../mir/optimizations.md) can help a bit with this. Another |
| 57 | +optimization currently under development is called _polymorphization_. |
| 58 | + |
| 59 | +The general idea is that often we can share some code between monomorphized |
| 60 | +copies of code. More precisely, if a MIR block is not dependent on a type |
| 61 | +parameter, it may not need to be monomorphized into many copies. Consider the |
| 62 | +following example: |
| 63 | + |
| 64 | +```rust |
| 65 | +pub fn f() { |
| 66 | + g::<bool>(); |
| 67 | + g::<usize>(); |
| 68 | +} |
| 69 | + |
| 70 | +fn g<T>() -> usize { |
| 71 | + let n = 1; |
| 72 | + let closure = || n; |
| 73 | + closure() |
| 74 | +} |
| 75 | +``` |
| 76 | + |
| 77 | +In this case, we would currently collect `[f, g::<bool>, g::<usize>, |
| 78 | +g::<bool>::{{closure}}, g::<usize>::{{closure}}]`, but notice that the two |
| 79 | +closures would be identical -- they don't depend on the type parameter `T` of |
| 80 | +function `g`. So we only need to emit one copy of the closure. |
| 81 | + |
| 82 | +For more information, see [this thread on github][polymorph]. |
| 83 | + |
| 84 | +[polymorph]: https://github.com/rust-lang/rust/issues/46477 |
0 commit comments