|
| 1 | +- Feature Name: `optimize_attr` |
| 2 | +- Start Date: 2018-03-26 |
| 3 | +- RFC PR: [rust-lang/rfcs#2412](https://github.com/rust-lang/rfcs/pull/2412) |
| 4 | +- Rust Issue: [rust-lang/rust#54882](https://github.com/rust-lang/rust/issues/54882) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +This RFC introduces the `#[optimize]` attribute for controlling optimization level on a per-item |
| 10 | +basis. |
| 11 | + |
| 12 | +# Motivation |
| 13 | +[motivation]: #motivation |
| 14 | + |
| 15 | +Currently, rustc has only a small number of optimization options that apply globally to the |
| 16 | +crate. With LTO and RLIB-only crates these options become applicable to a whole-program, which |
| 17 | +reduces the ability to control optimization even further. |
| 18 | + |
| 19 | +For applications such as embedded, it is critical, that they satisfy the size constraints. This |
| 20 | +means, that code must consciously pick one or the other optimization level. Absence of a method to |
| 21 | +selectively optimize different parts of a program in different ways precludes users from utilising |
| 22 | +the hardware they have to the greatest degree. |
| 23 | + |
| 24 | +With a C toolchain selective optimization is fairly easy to achieve by compiling the relevant |
| 25 | +codegen units (objects) with different options. In Rust ecosystem, where the concept of such units |
| 26 | +does not exist, an alternate solution is necessary. |
| 27 | + |
| 28 | +With the `#[optimize]` attribute it is possible to annotate the optimization level of separate |
| 29 | +items, so that they are optimized differently from the global optimization option. |
| 30 | + |
| 31 | +# Guide-level explanation |
| 32 | +[guide-level-explanation]: #guide-level-explanation |
| 33 | + |
| 34 | +## `#[optimize(size)]` |
| 35 | + |
| 36 | +Sometimes, optimizations are a trade-off between execution time and the code size. Some |
| 37 | +optimizations, such as loop unrolling increase code size many times on average (compared to |
| 38 | +original function size) for marginal performance benefits. In case such optimization is not |
| 39 | +desirable… |
| 40 | + |
| 41 | +```rust |
| 42 | +#[optimize(size)] |
| 43 | +fn banana() { |
| 44 | + // code |
| 45 | +} |
| 46 | +``` |
| 47 | + |
| 48 | +…will instruct rustc to consider this trade-off more carefully and avoid optimising in a way that |
| 49 | +would result in larger code rather than a smaller one. It may also have effect on what instructions |
| 50 | +are selected to appear in the final binary. |
| 51 | + |
| 52 | +Note that `#[optimize(size)]` is a hint, rather than a hard requirement and compiler may still, |
| 53 | +while optimising, take decisions that increase function size compared to an entirely unoptimized |
| 54 | +result. |
| 55 | + |
| 56 | +Using this attribute is recommended when inspection of generated code reveals unnecessarily large |
| 57 | +function or functions, but use of `-O` is still preferable over `-C opt-level=s` or `-C |
| 58 | +opt-level=z`. |
| 59 | + |
| 60 | +## `#[optimize(speed)]` |
| 61 | + |
| 62 | +Conversely, when one of the global optimization options for code size is used (`-Copt-level=s` or |
| 63 | +`-Copt-level=z`), profiling might reveal some functions that are unnecessarily “hot”. In that case, |
| 64 | +those functions may be annotated with the `#[optimize(speed)]` to make the compiler make its best |
| 65 | +effort to produce faster code. |
| 66 | + |
| 67 | +```rust |
| 68 | +#[optimize(speed)] |
| 69 | +fn banana() { |
| 70 | + // code |
| 71 | +} |
| 72 | +``` |
| 73 | + |
| 74 | +Much like with `#[optimize(size)]`, the `speed` counterpart is also a hint and will likely not |
| 75 | +yield the same results as using the global optimization option for speed. |
| 76 | + |
| 77 | +# Reference-level explanation |
| 78 | +[reference-level-explanation]: #reference-level-explanation |
| 79 | + |
| 80 | +The `#[optimize(size)]` attribute applied to an item or expression will instruct the optimization |
| 81 | +pipeline to avoid applying optimizations that could result in a size increase and machine code |
| 82 | +generator to generate code that’s smaller rather than faster. |
| 83 | + |
| 84 | +The `#[optimize(speed)]` attribute applied to an item or expression will instruct the optimization |
| 85 | +pipeline to apply optimizations that are likely to yield performance wins and machine code |
| 86 | +generator to generate code that’s faster rather than smaller. |
| 87 | + |
| 88 | +The `#[optimize]` attributes are just a hint to the compiler and are not guaranteed to result in |
| 89 | +any different code. |
| 90 | + |
| 91 | +If an `#[optimize]` attribute is applied to some grouping item (such as `mod` or a crate), it |
| 92 | +propagates transitively to all items defined within the grouping item. Note, that a function is |
| 93 | +also a “grouping” item for the purposes of this RFC, and `#[optimize]` attribute applied to a |
| 94 | +function will propagate to other functions or closures defined within the body of the function. |
| 95 | + |
| 96 | +`#[optimize]` attribute may also be applied to a closure expression using the currently unstable |
| 97 | +`stmt_expr_attributes` feature. |
| 98 | + |
| 99 | +It is an error to specify multiple incompatible `#[optimize]` options to a single item or |
| 100 | +expression at once. A more explicit `#[optimize]` attribute overrides a propagated attribute. |
| 101 | + |
| 102 | +`#[optimize(speed)]` is a no-op when a global optimization for speed option is set (i.e. `-C |
| 103 | +opt-level=1-3`). Similarly `#[optimize(size)]` is a no-op when a global optimization for size |
| 104 | +option is set (i.e. `-C opt-level=s/z`). `#[optimize]` attributes are no-op when no optimizations |
| 105 | +are done globally (i.e. `-C opt-level=0`). In all other cases the *exact* interaction of the |
| 106 | +`#[optimize]` attribute with the global optimization level is not specified and is left up to |
| 107 | +implementation to decide. |
| 108 | + |
| 109 | +`#[optimize]` attribute applied to non function-like items (such as `struct`) or non function-like |
| 110 | +expressions (i.e. not closures) is considered “unused” as of this RFC and should fire the |
| 111 | +`unused_attribute` lint (unless the same attribute was used for a function-like item or expression, |
| 112 | +via e.g. propagation). Some future RFC may assign some behaviour to this attribute with respect to |
| 113 | +such definitions. |
| 114 | + |
| 115 | +# Implementation approach |
| 116 | + |
| 117 | +For the LLVM backend, these attributes may be implemented in a following manner: |
| 118 | + |
| 119 | +`#[optimize(size)]` – explicit function attributes exist at LLVM level. Items with |
| 120 | +`optimize(size)` would simply apply the LLVM attributes to the functions. |
| 121 | + |
| 122 | +`#[optimize(speed)]` in conjunction with `-C opt-level=s/z` – use a global optimization level of |
| 123 | +`-C opt-level=2/3` and apply the equivalent LLVM function attribute (`optsize`, `minsize`) to all |
| 124 | +items which do not have an `#[optimize(speed)]` attribute. |
| 125 | + |
| 126 | +# Drawbacks |
| 127 | +[drawbacks]: #drawbacks |
| 128 | + |
| 129 | +* Not all of the alternative codegen backends may be able to express such a request, hence the |
| 130 | +“this is a hint” note on the `#[optimize]` attribute. |
| 131 | + * As a fallback, this attribute may be implemented in terms of more specific optimization hints |
| 132 | + (such as `inline(never)`, the future `unroll(never)` etc). |
| 133 | + |
| 134 | +# Rationale and alternatives |
| 135 | +[alternatives]: #alternatives |
| 136 | + |
| 137 | +Proposed is a very semantic solution (describes the desired result, instead of behaviour) to the |
| 138 | +problem of needing to sometimes inhibit some of the trade-off optimizations such as loop unrolling. |
| 139 | + |
| 140 | +Alternative, of course, would be to add attributes controlling such optimizations, such as |
| 141 | +`#[unroll(no)]` on top of a loop statement. There’s already precedent for this in the `#[inline]` |
| 142 | +annotations. |
| 143 | + |
| 144 | +The author would like to argue that we should eventually have *both*, the `#[optimize]` for |
| 145 | +people who look at generated code but are not willing to dig for exact reasons, and the targeted |
| 146 | +attributes for people who know *why* the code is not satisfactory. |
| 147 | + |
| 148 | +Furthermore, currently `optimize` is able to do more than any possible combination of targeted |
| 149 | +attributes would be able to such as influencing the instruction selection or switch codegen |
| 150 | +strategy (jump table, if chain, etc.) This makes the attribute useful even in presence of all the |
| 151 | +targeted optimization knobs we might have in the future. |
| 152 | + |
| 153 | +# Prior art |
| 154 | +[prior-art]: #prior-art |
| 155 | + |
| 156 | +* LLVM: `optsize`, `optnone`, `minsize` function attributes (exposed in Clang in some way); |
| 157 | +* GCC: `__attribute__((optimize))` function attribute which allows setting the optimization level |
| 158 | +and using certain(?) `-f` flags for each function; |
| 159 | +* IAR: Optimizations have a check box for “No size constraints”, which allows compiler to go out of |
| 160 | +its way to optimize without considering the size trade-off. Can only be applied on a |
| 161 | +per-compilation-unit basis. Enabled by default, as is appropriate for a compiler targeting |
| 162 | +embedded use-cases. |
| 163 | + |
| 164 | +# Unresolved questions |
| 165 | +[unresolved]: #unresolved-questions |
| 166 | + |
| 167 | +* Should we also implement `optimize(always)`? `optimize(level=x)`? |
| 168 | + * Left for future discussion, but should make sure such extension is possible. |
| 169 | +* Should there be any way to specify what global optimization for speed level is used in |
| 170 | + conjunction with the optimization for speed option (e.g. `-Copt-level=s3` could be equivalent to |
| 171 | + `-Copt-level=3` and `#[optimize(size)]` on the crate item); |
| 172 | + * This may matter for users of `#[optimize(speed)]`. |
| 173 | +* Are the propagation and `unused_attr` approaches right? |
0 commit comments