-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Inline assembly #2850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline assembly #2850
Changes from 1 commit
5559913
4edab76
03f22fe
31cab5e
add2123
fad338a
806228f
fa90ad2
fc99a8c
2572476
93b179c
d82b70d
2adfde8
89cbf28
5884b7f
f427d17
feecfa1
15c7aa8
7d066ce
44853af
b5d854d
6b3a129
d83a1e1
904b71e
a48259d
843b6cf
a1467e2
0f4ffff
8c54738
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,7 +10,9 @@ This RFC specifies a new syntax for inline assembly which is suitable for eventu | |
|
||
The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand. | ||
|
||
The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized. | ||
The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized. | ||
|
||
[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843 | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
@@ -247,7 +249,21 @@ This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache siz | |
|
||
However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer if things like clobbering could be motivated and specified in terms of the constraints that are put on the assembly, not on the compiler. I gather people writing inline assembly are used to thinking very much in terms of compiler details such as register allocation, but I'd prefer if any such remarks would be non-normative notes. For example, I'd say that the motivation for explicit declaration of clobbered registers is that assembly may not access (read nor write) any register not declared in its inputs and outputs. So, the Phrasing things in terms of constraints on the compiler is akin to specifying a language feature by the set of allowed optimizations: it leaves us with no way of understanding the feature as a language construct on its own right, independent of compilation or any such thing. This is the same concern as trying to specify From what I can tell, basically everything in here can be described in the abstract way I propose, which makes me very happy! As a formal methods person, inline assembly is something like my Kryptonite, but this RFC is very well-written and I felt I could mostly follow it. :D It's just a matter of wording, I think. The one point I am less sure about is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This part is in the guide-level explanation, which tries to be a gentle (ish) introduction to inline assembly rather than a proper specification. In this particular case, it helps to explain to users why indicating proper clobbers is important. The reference-level explanation is more rigorous in specifying exactly what is/isn't UB. Now regarding the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree that it helps to motivate why something is UB. But even for the gentle introduction, I think it would help to frame this as "that code is UB (and here's why), and here's how we add these clobber annotations to fix that". It puts the reader into the right frame of mind. Otherwise, we risk educating people to primarily think about the compiler's choices when writing such code. The way things stand right now, the guide does not ever say anything about UB. That is quite unusual for Rust where we are usually very explicit, even in guide-level material, about what the rules are to avoid UB.
The term "UB"/"Undefined Behavior" does not even occur in the RFC, though. So, what you are saying is not entirely accurate.
If misusing lateout is not UB, we have to specify what happens in that case. So, what does that spec look like? "Non-deterministically, writing to a lateout register might overwrite an input register"?
It is far from clear that this is a deliberate choice. I am not sure what "should" means in a spec. Either behavior is UB or it is specified. The spec isn't really the right place for recommendations or advice. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For lateout, the specification is:
Everything else follows as a consequence of that: if you write to the output, you may overwrite one of your inputs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
On some level it depends on whether we define this as undefined behavior. I'd rather do so, because breaking this invariant can trivially cause undefined behavior in otherwise well-defined (assembly) code. let xs = [1u32, 2];
let mut sum;
asm!("
mov {sum41}, 41
sum_loop:
add {sum41}, [{ptr} + ecx*4 - 1]
loop sum_loop
",
sum41 = lateout(reg) sum,
ptr = in(reg) xs.as_ptr(),
in("ecx") xs.len(),
);
assert_eq!(sum, 44); Whenever the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is wrong though. For instance, the following piece of inline asm does not clobber eax, despite accessing it: asm!("push eax; mov eax, 4; pop eax"); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, this is not undefined behavior, e.g., // allocate 8-bytes aligned to a 16-byte boundary:
let ptr: *mut f32 = GlobalAlloc::alloc(Layout::new(8, 16)?) as *mut _;
assert!(!ptr.is_null());
let mut arr = [MaybeUninit<f32>; 4];
// read 16-bytes from that pointer, the later 8-bytes are a read out-of-bounds:
asm!("
movntdqa xmm0, {0};
movaps {1}, xmm0;
",
in(ptr), out(&arr)); What's UB is an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think this goes in the right direction. I'll reword this as "An assembly block shall only have specified [0] side-effects, i.e., if an assembly block has a side-effect that's not specified, the behavior is undefined". [0] "specified" as in "stated" within the assembly block. This subtly different definition means that an assembly block can actually read from almost any register, including those that are not specified as inputs (e.g. like flags registers), because doing so is not a side-effect. I say almost any, because reading from some registers can be a side-effect (i.e. can modify the value of that or other registers). The same is true for writes, e.g., on RISC-V, the Then there are other side-effects that are part of the What I think might be a bit confusing is the split between the declaration of "additive" and "negative" side-effects in the In particular, this RFC provides rationale for assuming that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thanks for adding that, I was not aware of this. @gnzlbg's definition makes sense to me.
It probably also goes without saying that there are no guarantees whatsoever for the value that is being read from such registers. ("Read" is a kind of effect though, in the sense that entirely pure code ["mathematical functions"] must not even read from mutable things. It's just a harmless effect in many cases. I am not sure how much bearing this fact has on the discussion at hand, but I get a bit worried every time someone claims that reads don't have side-effects.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think this is a software (compiler) vs hardware thing. If you read in source code, the compiler has to make sure there's a value there for you to read, which affects the code it generates, meaning there were side-effects to that read - and if the thing you were reading is mutable, this might visibly change what's written there (conversely, if you're a pure function then it might not make sure that you have something to read and then UB). At the hardware level, there's always something to read at that register, even if you can't guarantee what it is, so no extra effort needs to be made. A pure Rust function could have inline asm with the pure marker, and reading from a random register should still be no problem (although, of course, odd). Maybe otherwise put, in the "memory-model" of the cpu-registers, read really isn't an effect (except for the special registers where it is). For actual physical memory, that's probably more delicate. |
||
|
||
This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code. | ||
This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code: | ||
|
||
```rust | ||
// Multiply x by 6 using shifts and adds | ||
let mut x = 4; | ||
unsafe { | ||
asm!(" | ||
mov {tmp}, {x} | ||
shl {tmp}, 1 | ||
shl {x}, 2 | ||
add {x}, {tmp} | ||
", x = inout(reg) x, tmp = out(reg) _); | ||
} | ||
assert_eq!(x, 4 * 6); | ||
``` | ||
|
||
## Register template modifiers | ||
|
||
|
@@ -402,7 +418,7 @@ Here is the list of currently supported register classes: | |
|
||
> Notes on allowed types: | ||
> - Pointers and references are allowed where the equivalent integer type is allowed. | ||
> - `iLEN` refers to both sized and unsized integer types. It also implicitly includes `isize` and `usize` where the length matches. | ||
> - `iLEN` refers to both sized and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches. | ||
Amanieu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
> - Fat pointers are not allowed. | ||
> - `vLEN` refers to a SIMD vector that is `LEN` bits wide. | ||
Amanieu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
@@ -421,6 +437,7 @@ Some registers have multiple names. These are all treated by the compiler as ide | |
| x86 | `bp` | `bpl`, `ebp`, `rbp` | | ||
| x86 | `sp` | `spl`, `esp`, `rsp` | | ||
| x86 | `ip` | `eip`, `rip` | | ||
| x86 | `st(0)` | `st` | | ||
| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` | | ||
| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` | | ||
| AArch64 | `x[0-30]` | `w[0-30]` | | ||
|
@@ -464,6 +481,8 @@ Some registers cannot be used for input or output operands: | |
| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. | | ||
| x86 | `k0` | This is a constant zero register which can't be modified. | | ||
| x86 | `ip` | This is the program counter, not a real register. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So that the compiler can provide better error messages ("unknown register" vs "disallowed register"). |
||
| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). | | ||
| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). | | ||
| AArch64 | `xzr` | This is a constant zero register which can't be modified. | | ||
| ARM | `pc` | This is the program counter, not a real register. | | ||
| RISC-V | `x0` | This is a constant zero register which can't be modified. | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reasonable question to ask of this would be "is there anything in the design of this feature that precludes support for additional architectures, or is the feature sufficiently general that we do not (to the best of our ability) foresee any difficulty supporting additional architectures in a backwards-compatible way?" For example, elsewhere the document discusses how registers are highly architecture-specific; are registers the only place we would expect such different behavior, or are there other potential points of divergence? (I'm also not suggesting that we answer this question in the summary; perhaps in the Future Possibilities section.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, register definitions are basically the only thing needed to add support for a new architecture. This should be fairly straightforward once the basic infrastructure for inline asm is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it's something alien, like e.g. Intel GPU ISA 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice the backend must have the inline assembly support for said target too. Not all of them do.