Skip to content

Commit 4baa3fc

Browse files
authored
Merge pull request #2412 from nagisa/optimise-size
The optimize attribute
2 parents 0125668 + ce58d27 commit 4baa3fc

File tree

1 file changed

+173
-0
lines changed

1 file changed

+173
-0
lines changed

text/2412-optimize-attr.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
- Feature Name: `optimize_attr`
2+
- Start Date: 2018-03-26
3+
- RFC PR: [rust-lang/rfcs#2412](https://github.com/rust-lang/rfcs/pull/2412)
4+
- Rust Issue: [rust-lang/rust#54882](https://github.com/rust-lang/rust/issues/54882)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC introduces the `#[optimize]` attribute for controlling optimization level on a per-item
10+
basis.
11+
12+
# Motivation
13+
[motivation]: #motivation
14+
15+
Currently, rustc has only a small number of optimization options that apply globally to the
16+
crate. With LTO and RLIB-only crates these options become applicable to a whole-program, which
17+
reduces the ability to control optimization even further.
18+
19+
For applications such as embedded, it is critical, that they satisfy the size constraints. This
20+
means, that code must consciously pick one or the other optimization level. Absence of a method to
21+
selectively optimize different parts of a program in different ways precludes users from utilising
22+
the hardware they have to the greatest degree.
23+
24+
With a C toolchain selective optimization is fairly easy to achieve by compiling the relevant
25+
codegen units (objects) with different options. In Rust ecosystem, where the concept of such units
26+
does not exist, an alternate solution is necessary.
27+
28+
With the `#[optimize]` attribute it is possible to annotate the optimization level of separate
29+
items, so that they are optimized differently from the global optimization option.
30+
31+
# Guide-level explanation
32+
[guide-level-explanation]: #guide-level-explanation
33+
34+
## `#[optimize(size)]`
35+
36+
Sometimes, optimizations are a trade-off between execution time and the code size. Some
37+
optimizations, such as loop unrolling increase code size many times on average (compared to
38+
original function size) for marginal performance benefits. In case such optimization is not
39+
desirable…
40+
41+
```rust
42+
#[optimize(size)]
43+
fn banana() {
44+
// code
45+
}
46+
```
47+
48+
…will instruct rustc to consider this trade-off more carefully and avoid optimising in a way that
49+
would result in larger code rather than a smaller one. It may also have effect on what instructions
50+
are selected to appear in the final binary.
51+
52+
Note that `#[optimize(size)]` is a hint, rather than a hard requirement and compiler may still,
53+
while optimising, take decisions that increase function size compared to an entirely unoptimized
54+
result.
55+
56+
Using this attribute is recommended when inspection of generated code reveals unnecessarily large
57+
function or functions, but use of `-O` is still preferable over `-C opt-level=s` or `-C
58+
opt-level=z`.
59+
60+
## `#[optimize(speed)]`
61+
62+
Conversely, when one of the global optimization options for code size is used (`-Copt-level=s` or
63+
`-Copt-level=z`), profiling might reveal some functions that are unnecessarily “hot”. In that case,
64+
those functions may be annotated with the `#[optimize(speed)]` to make the compiler make its best
65+
effort to produce faster code.
66+
67+
```rust
68+
#[optimize(speed)]
69+
fn banana() {
70+
// code
71+
}
72+
```
73+
74+
Much like with `#[optimize(size)]`, the `speed` counterpart is also a hint and will likely not
75+
yield the same results as using the global optimization option for speed.
76+
77+
# Reference-level explanation
78+
[reference-level-explanation]: #reference-level-explanation
79+
80+
The `#[optimize(size)]` attribute applied to an item or expression will instruct the optimization
81+
pipeline to avoid applying optimizations that could result in a size increase and machine code
82+
generator to generate code that’s smaller rather than faster.
83+
84+
The `#[optimize(speed)]` attribute applied to an item or expression will instruct the optimization
85+
pipeline to apply optimizations that are likely to yield performance wins and machine code
86+
generator to generate code that’s faster rather than smaller.
87+
88+
The `#[optimize]` attributes are just a hint to the compiler and are not guaranteed to result in
89+
any different code.
90+
91+
If an `#[optimize]` attribute is applied to some grouping item (such as `mod` or a crate), it
92+
propagates transitively to all items defined within the grouping item. Note, that a function is
93+
also a “grouping” item for the purposes of this RFC, and `#[optimize]` attribute applied to a
94+
function will propagate to other functions or closures defined within the body of the function.
95+
96+
`#[optimize]` attribute may also be applied to a closure expression using the currently unstable
97+
`stmt_expr_attributes` feature.
98+
99+
It is an error to specify multiple incompatible `#[optimize]` options to a single item or
100+
expression at once. A more explicit `#[optimize]` attribute overrides a propagated attribute.
101+
102+
`#[optimize(speed)]` is a no-op when a global optimization for speed option is set (i.e. `-C
103+
opt-level=1-3`). Similarly `#[optimize(size)]` is a no-op when a global optimization for size
104+
option is set (i.e. `-C opt-level=s/z`). `#[optimize]` attributes are no-op when no optimizations
105+
are done globally (i.e. `-C opt-level=0`). In all other cases the *exact* interaction of the
106+
`#[optimize]` attribute with the global optimization level is not specified and is left up to
107+
implementation to decide.
108+
109+
`#[optimize]` attribute applied to non function-like items (such as `struct`) or non function-like
110+
expressions (i.e. not closures) is considered “unused” as of this RFC and should fire the
111+
`unused_attribute` lint (unless the same attribute was used for a function-like item or expression,
112+
via e.g. propagation). Some future RFC may assign some behaviour to this attribute with respect to
113+
such definitions.
114+
115+
# Implementation approach
116+
117+
For the LLVM backend, these attributes may be implemented in a following manner:
118+
119+
`#[optimize(size)]` – explicit function attributes exist at LLVM level. Items with
120+
`optimize(size)` would simply apply the LLVM attributes to the functions.
121+
122+
`#[optimize(speed)]` in conjunction with `-C opt-level=s/z` – use a global optimization level of
123+
`-C opt-level=2/3` and apply the equivalent LLVM function attribute (`optsize`, `minsize`) to all
124+
items which do not have an `#[optimize(speed)]` attribute.
125+
126+
# Drawbacks
127+
[drawbacks]: #drawbacks
128+
129+
* Not all of the alternative codegen backends may be able to express such a request, hence the
130+
“this is a hint” note on the `#[optimize]` attribute.
131+
* As a fallback, this attribute may be implemented in terms of more specific optimization hints
132+
(such as `inline(never)`, the future `unroll(never)` etc).
133+
134+
# Rationale and alternatives
135+
[alternatives]: #alternatives
136+
137+
Proposed is a very semantic solution (describes the desired result, instead of behaviour) to the
138+
problem of needing to sometimes inhibit some of the trade-off optimizations such as loop unrolling.
139+
140+
Alternative, of course, would be to add attributes controlling such optimizations, such as
141+
`#[unroll(no)]` on top of a loop statement. There’s already precedent for this in the `#[inline]`
142+
annotations.
143+
144+
The author would like to argue that we should eventually have *both*, the `#[optimize]` for
145+
people who look at generated code but are not willing to dig for exact reasons, and the targeted
146+
attributes for people who know *why* the code is not satisfactory.
147+
148+
Furthermore, currently `optimize` is able to do more than any possible combination of targeted
149+
attributes would be able to such as influencing the instruction selection or switch codegen
150+
strategy (jump table, if chain, etc.) This makes the attribute useful even in presence of all the
151+
targeted optimization knobs we might have in the future.
152+
153+
# Prior art
154+
[prior-art]: #prior-art
155+
156+
* LLVM: `optsize`, `optnone`, `minsize` function attributes (exposed in Clang in some way);
157+
* GCC: `__attribute__((optimize))` function attribute which allows setting the optimization level
158+
and using certain(?) `-f` flags for each function;
159+
* IAR: Optimizations have a check box for “No size constraints”, which allows compiler to go out of
160+
its way to optimize without considering the size trade-off. Can only be applied on a
161+
per-compilation-unit basis. Enabled by default, as is appropriate for a compiler targeting
162+
embedded use-cases.
163+
164+
# Unresolved questions
165+
[unresolved]: #unresolved-questions
166+
167+
* Should we also implement `optimize(always)`? `optimize(level=x)`?
168+
* Left for future discussion, but should make sure such extension is possible.
169+
* Should there be any way to specify what global optimization for speed level is used in
170+
conjunction with the optimization for speed option (e.g. `-Copt-level=s3` could be equivalent to
171+
`-Copt-level=3` and `#[optimize(size)]` on the crate item);
172+
* This may matter for users of `#[optimize(speed)]`.
173+
* Are the propagation and `unused_attr` approaches right?

0 commit comments

Comments
 (0)