Skip to content

Commit cef2ebc

Browse files
committed
Adjust text to add optimize(speed)
1 parent b1b24aa commit cef2ebc

File tree

1 file changed

+83
-38
lines changed

1 file changed

+83
-38
lines changed

text/0000-optimise-attr.md

Lines changed: 83 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
# Summary
77
[summary]: #summary
88

9-
This RFC introduces the `#[optimize]` attribute, specifically its `#[optimize(size)]` variant for
10-
controlling optimisation level on a per-item basis.
9+
This RFC introduces the `#[optimize]` attribute for controlling optimisation level on a per-item
10+
basis.
1111

1212
# Motivation
1313
[motivation]: #motivation
@@ -17,24 +17,26 @@ crate. With LTO and RLIB-only crates these options become applicable to a whole-
1717
reduces the ability to control optimisation even further.
1818

1919
For applications such as embedded, it is critical, that they satisfy the size constraints. This
20-
means, that code must consciously pick one or the other optimisation level. However, since
21-
optimisation level is increasingly applied program-wide, options like `-Copt-level=3` or
22-
`-Copt-level=s` are less and less useful – it is no longer feasible (and never was feasible with
23-
cargo) to use the former one for code where performance matters and the latter everywhere else.
20+
means, that code must consciously pick one or the other optimisation level. Absence of a method to
21+
selectively optimise different parts of a program in different ways precludes users from utilising
22+
the hardware they have to the greatest degree.
2423

25-
With a C toolchain this is fairly easy to achieve by compiling the relevant objects with different
26-
options. In Rust ecosystem, however, where this concept does not exist, an alternate solution is
27-
necessary.
24+
With a C toolchain selective optimisation is fairly easy to achieve by compiling the relevant
25+
codegen units (objects) with different options. In Rust ecosystem, where the concept of such units
26+
does not exist, an alternate solution is necessary.
2827

29-
With `#[optimize(size)]` it is possible to annotate separate functions, so that they are optimized
30-
for size in a project otherwise optimized for speed (which is the default for `cargo --release`).
28+
With the `#[optimize]` attribute it is possible to annotate the optimisation level of separate
29+
items, so that they are optimized differently from the global optimisation option.
3130

3231
# Guide-level explanation
3332
[guide-level-explanation]: #guide-level-explanation
3433

35-
Sometimes, optimisations are a tradeoff between execution time and the code size. Some
34+
## `#[optimize(size)]`
35+
36+
Sometimes, optimisations are a trade-off between execution time and the code size. Some
3637
optimisations, such as loop unrolling increase code size many times on average (compared to
37-
original function size).
38+
original function size) for marginal performance benefits. In case such optimisation is not
39+
desirable…
3840

3941
```rust
4042
#[optimize(size)]
@@ -43,7 +45,7 @@ fn banana() {
4345
}
4446
```
4547

46-
Will instruct rustc to consider this tradeoff more carefully and avoid optimising in a way that
48+
…will instruct rustc to consider this trade-off more carefully and avoid optimising in a way that
4749
would result in larger code rather than a smaller one. It may also have effect on what instructions
4850
are selected to appear in the final binary.
4951

@@ -55,26 +57,66 @@ Using this attribute is recommended when inspection of generated code reveals un
5557
function or functions, but use of `-O` is still preferable over `-C opt-level=s` or `-C
5658
opt-level=z`.
5759

60+
## `#[optimize(speed)]`
61+
62+
Conversely, when one of the global optimisation options for code size is used (`-Copt-level=s` or
63+
`-Copt-level=z`), profiling might reveal some functions that are unnecessarily “hot”. In that case,
64+
those functions may be annotated with the `#[optimize(speed)]` to make the compiler make its best
65+
effort to produce faster code.
66+
67+
```rust
68+
#[optimize(speed)]
69+
fn banana() {
70+
// code
71+
}
72+
```
73+
74+
Much like with `#[optimize(size)]`, the `speed` counterpart is also a hint and will likely not
75+
yield the same results as using the global optimisation option for speed.
76+
5877
# Reference-level explanation
5978
[reference-level-explanation]: #reference-level-explanation
6079

61-
The `#[optimize(size)]` attribute applied to a function definition will instruct the optimisation
62-
engine to avoid applying optimisations that could result in a size increase and machine code
63-
generator to generate code that’s smaller rather than larger.
80+
The `#[optimize(size)]` attribute applied to an item will instruct the optimisation pipeline to
81+
avoid applying optimisations that could result in a size increase and machine code generator to
82+
generate code that’s smaller rather than faster.
83+
84+
The `#[optimize(speed)]` attribute applied to an item will instruct the optimisation pipeline to
85+
apply optimisations that are likely to yield performance wins and machine code generator to
86+
generate code that’s faster rather than smaller.
87+
88+
The `#[optimize]` attributes are just a hint to the compiler and are not guaranteed to result in
89+
any different code.
90+
91+
If an `#[optimize]` attribute is applied to some grouping item (such as `mod` or a crate), it
92+
propagates transitively to all items defined within the grouping item.
93+
94+
It is an error to specify multiple incompatible `#[optimize]` options to a single item at once.
95+
A more explicit `#[optimize]` attribute overrides a propagated attribute.
96+
97+
`#[optimize(speed)]` is a no-op when a global optimisation for speed option is set (i.e. `-C
98+
opt-level=1-3`). Similarly `#[optimize(size)]` is a no-op when a global optimisation for size
99+
option is set (i.e. `-C opt-level=s/z`). `#[optimize]` attributes are no-op when no optimizations
100+
are done globally (i.e. `-C opt-level=0`). In all other cases the *exact* interaction of the
101+
`#[optimize]` attribute with the global optimization level is not specified and is left up to
102+
implementation to decide.
103+
104+
# Implementation approach
105+
106+
For the LLVM backend, these attributes may be implemented in a following manner:
64107

65-
Note that the `#[optimize(size)]` attribute is just a hint and is not guaranteed to result in any
66-
different or smaller code.
108+
`#[optimize(size)]` – explicit function attributes exist at LLVM level. Items with
109+
`optimize(size)` would simply apply the LLVM attributes to the functions.
67110

68-
Since `#[optimize(size)]` instructs optimisations to behave in a certain way, this means that this
69-
attribute has no effect when no optimisations are run (such as is the case when `-Copt-level=0`).
70-
Interaction of this attribute with the `-Copt-level=s` and `-Copt-level=z` flags is not specified
71-
and is left up to implementation to decide.
111+
`#[optimize(speed)]` in conjunction with `-C opt-level=s/z` – use a global optimisation level of
112+
`-C opt-level=2/3` and apply the equivalent LLVM function attribute (`optsize`, `minsize`) to all
113+
items which do not have an `#[optimize(speed)]` attribute.
72114

73115
# Drawbacks
74116
[drawbacks]: #drawbacks
75117

76118
* Not all of the alternative codegen backends may be able to express such a request, hence the
77-
“this is an optimisation hint” note on the `#[optimize(size)]` attribute.
119+
“this is a hint” note on the `#[optimize]` attribute.
78120
* As a fallback, this attribute may be implemented in terms of more specific optimisation hints
79121
(such as `inline(never)`, the future `unroll(never)` etc).
80122

@@ -85,32 +127,35 @@ Proposed is a very semantic solution (describes the desired result, instead of b
85127
problem of needing to sometimes inhibit some of the trade-off optimisations such as loop unrolling.
86128

87129
Alternative, of course, would be to add attributes controlling such optimisations, such as
88-
`#[unroll(no)]` on top of a a loop statement. There’s already precedent for this in the `#[inline]`
130+
`#[unroll(no)]` on top of a loop statement. There’s already precedent for this in the `#[inline]`
89131
annotations.
90132

91-
The author would like to argue that we should eventually have *both*, the `#[optimize(size)]` for
92-
people who look at generated code and decide that it is too large, and the targetted attributes for
93-
people who know *why* the code is too large.
133+
The author would like to argue that we should eventually have *both*, the `#[optimize]` for
134+
people who look at generated code but are not willing to dig for exact reasons, and the targeted
135+
attributes for people who know *why* the code is not satisfactory.
94136

95-
Furthermore, currently `optimize(size)` is able to do more than any possible combination of
96-
targetted attributes would be able to such as influencing the instruction selection or switch
97-
codegen strategy (jump table, if chain, etc.) This makes the attribute useful even in presence of
98-
all the targetted optimisation knobs we might have in the future.
137+
Furthermore, currently `optimize` is able to do more than any possible combination of targeted
138+
attributes would be able to such as influencing the instruction selection or switch codegen
139+
strategy (jump table, if chain, etc.) This makes the attribute useful even in presence of all the
140+
targeted optimisation knobs we might have in the future.
99141

100142
# Prior art
101143
[prior-art]: #prior-art
102144

103145
* LLVM: `optsize`, `optnone`, `minsize` function attributes (exposed in Clang in some way);
104146
* GCC: `__attribute__((optimize))` function attribute which allows setting the optimisation level
105147
and using certain(?) `-f` flags for each function;
106-
* IAR: Optimisations have a checkbox for “No size constraints”, which allows compiler to go out of
107-
its way to optimize without considering the size tradeoff. Can only be applied on a
108-
per-compilation-unit basis. Enabled by default, as is appropriate for a compiler targetting
148+
* IAR: Optimisations have a check box for “No size constraints”, which allows compiler to go out of
149+
its way to optimize without considering the size trade-off. Can only be applied on a
150+
per-compilation-unit basis. Enabled by default, as is appropriate for a compiler targeting
109151
embedded use-cases.
110152

111153
# Unresolved questions
112154
[unresolved]: #unresolved-questions
113155

114-
* Should we support such an attribute at module-level? Crate-level?
115-
* If yes, should we also implement `optimize(always)`? `optimize(level=x)`?
116-
* Left for future discussion, but should make sure such extension is possible.
156+
* Should we also implement `optimize(always)`? `optimize(level=x)`?
157+
* Left for future discussion, but should make sure such extension is possible.
158+
* Should there be any way to specify what global optimisation for speed level is used in
159+
conjunction with the optimisation for speed option (e.g. `-Copt-level=s3` could be equivalent to
160+
`-Copt-level=3` and `#[optimize(size)]` on the crate item);
161+
* This may matter for users of `#[optimize(speed)]`.

0 commit comments

Comments
 (0)