Skip to content

Inline assembly #2850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
5559913
Add inline asm RFC
Amanieu Jan 13, 2020
4edab76
Minor corrections
Amanieu Jan 14, 2020
03f22fe
More minor corrections
Amanieu Jan 14, 2020
31cab5e
Oops
Amanieu Jan 14, 2020
add2123
Add a section on rules for inline asm
Amanieu Jan 21, 2020
fad338a
Rename flags() to options()
Amanieu Jan 21, 2020
806228f
Clarify support for non-LLVM backends
Amanieu Jan 21, 2020
fa90ad2
Clarify reg class under ARM Thumb1
Amanieu Jan 22, 2020
fc99a8c
Clarify rules
Amanieu Jan 24, 2020
2572476
Rename imm to const
Amanieu Jan 25, 2020
93b179c
Rework register classes and modifiers
Amanieu Jan 30, 2020
d82b70d
Fix minor typos
Amanieu Feb 9, 2020
2adfde8
Disallow specifying the same register as both an input and an output …
Amanieu Feb 11, 2020
89cbf28
Clarify implicit operands that come after named operands
Amanieu Feb 11, 2020
5884b7f
Expand on default modifier used for registers
Amanieu Feb 15, 2020
f427d17
Explicit register operands can't be used in the template
Amanieu Feb 15, 2020
feecfa1
Add rationale for not supporting AT&T syntax
Amanieu Feb 16, 2020
15c7aa8
Add rationale for not validating the generating asm in rustc.
Amanieu Feb 16, 2020
7d066ce
Remove restriction on sym needing to be from current crate.
Amanieu Feb 24, 2020
44853af
Fix typos
Amanieu Feb 24, 2020
b5d854d
Fix some details to match WIP implementation
Amanieu Feb 24, 2020
6b3a129
Clarify that the compiler must treat asm! as a black box
Amanieu Feb 25, 2020
d83a1e1
Add more examples of asm! usage to motivation
Amanieu Feb 25, 2020
904b71e
Add type whitelist for each register class
Amanieu Feb 28, 2020
a48259d
Fix typo
Amanieu Feb 28, 2020
843b6cf
x[16-31] don't exist on RV32E
Amanieu Feb 29, 2020
a1467e2
Clarify rules
Amanieu Mar 1, 2020
0f4ffff
Pointers are i16 on some targets
Amanieu Mar 1, 2020
8c54738
Update semantics of "pure" option
Amanieu Mar 1, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions text/0000-inline-asm.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ This RFC specifies a new syntax for inline assembly which is suitable for eventu

The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures.

A reasonable question to ask of this would be "is there anything in the design of this feature that precludes support for additional architectures, or is the feature sufficiently general that we do not (to the best of our ability) foresee any difficulty supporting additional architectures in a backwards-compatible way?" For example, elsewhere the document discusses how registers are highly architecture-specific; are registers the only place we would expect such different behavior, or are there other potential points of divergence? (I'm also not suggesting that we answer this question in the summary; perhaps in the Future Possibilities section.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, register definitions are basically the only thing needed to add support for a new architecture. This should be fairly straightforward once the basic infrastructure for inline asm is implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register definitions are basically the only thing needed to add support for a new architecture. This should be fairly straightforward

Unless it's something alien, like e.g. Intel GPU ISA 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice the backend must have the inline assembly support for said target too. Not all of them do.


The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.
The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.

[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843

# Motivation
[motivation]: #motivation
Expand Down Expand Up @@ -247,7 +249,21 @@ This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache siz

However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if things like clobbering could be motivated and specified in terms of the constraints that are put on the assembly, not on the compiler. I gather people writing inline assembly are used to thinking very much in terms of compiler details such as register allocation, but I'd prefer if any such remarks would be non-normative notes.

For example, I'd say that the motivation for explicit declaration of clobbered registers is that assembly may not access (read nor write) any register not declared in its inputs and outputs. So, the cpuid example without the lateout("eax") _, lateout("edx") _ is UB due to violating those constraints. That motivates clobbering without ever mentioning what the compiler ends up doing with this information. The fact that the compiler has to save and restore registers is a compiler implementation detail that follows from this description, but it should not be normative.

Phrasing things in terms of constraints on the compiler is akin to specifying a language feature by the set of allowed optimizations: it leaves us with no way of understanding the feature as a language construct on its own right, independent of compilation or any such thing. This is the same concern as trying to specify black_box as "inhibiting optimizations" -- neither optimizations nor things like register save/restore should occur in a language spec, IMO.

From what I can tell, basically everything in here can be described in the abstract way I propose, which makes me very happy! As a formal methods person, inline assembly is something like my Kryptonite, but this RFC is very well-written and I felt I could mostly follow it. :D It's just a matter of wording, I think. The one point I am less sure about is the lateout thing... is it enough to say "if any input register gets read after this output was written to, that's UB"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is in the guide-level explanation, which tries to be a gentle (ish) introduction to inline assembly rather than a proper specification. In this particular case, it helps to explain to users why indicating proper clobbers is important. The reference-level explanation is more rigorous in specifying exactly what is/isn't UB.

Now regarding the lateout thing... I wouldn't say that behavior is undefined if you misuse it, just that you may end up (depending on the register allocator's choices) accidentally overwriting one of your inputs. The problem is entirely confined to the asm code and does not cause any UB and the Rust abstract machine or LLVM level. Hence the looser wording ("should" instead of "must").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is in the guide-level explanation, which tries to be a gentle (ish) introduction to inline assembly rather than a proper specification. In this particular case, it helps to explain to users why indicating proper clobbers is important.

I agree that it helps to motivate why something is UB. But even for the gentle introduction, I think it would help to frame this as "that code is UB (and here's why), and here's how we add these clobber annotations to fix that". It puts the reader into the right frame of mind. Otherwise, we risk educating people to primarily think about the compiler's choices when writing such code.

The way things stand right now, the guide does not ever say anything about UB. That is quite unusual for Rust where we are usually very explicit, even in guide-level material, about what the rules are to avoid UB.

The reference-level explanation is more rigorous in specifying exactly what is/isn't UB.

The term "UB"/"Undefined Behavior" does not even occur in the RFC, though. So, what you are saying is not entirely accurate.
I also looked for the clause that says that reading or writing any register not explicitly specified is UB, and could not find that either.

Now regarding the lateout thing... I wouldn't say that behavior is undefined if you misuse it, just that you may end up (depending on the register allocator's choices) accidentally overwriting one of your inputs.

If misusing lateout is not UB, we have to specify what happens in that case. So, what does that spec look like? "Non-deterministically, writing to a lateout register might overwrite an input register"?

Hence the looser wording ("should" instead of "must").

It is far from clear that this is a deliberate choice. I am not sure what "should" means in a spec. Either behavior is UB or it is specified. The spec isn't really the right place for recommendations or advice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For lateout, the specification is:

the register allocator can reuse a register allocated to an in.

Everything else follows as a consequence of that: if you write to the output, you may overwrite one of your inputs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now regarding the lateout thing... I wouldn't say that behavior is undefined if you misuse it, just that you may end up (depending on the register allocator's choices) accidentally overwriting one of your inputs. The problem is entirely confined to the asm code and does not cause any UB and the Rust abstract machine or LLVM level. Hence the looser wording ("should" instead of "must").

On some level it depends on whether we define this as undefined behavior. I'd rather do so, because breaking this invariant can trivially cause undefined behavior in otherwise well-defined (assembly) code.
Consider for example:

let xs = [1u32, 2];
let mut sum;
asm!("
    mov {sum41}, 41
    sum_loop:
        add {sum41}, [{ptr} + ecx*4 - 1]
    loop sum_loop
    ",
    sum41 = lateout(reg) sum,
    ptr = in(reg) xs.as_ptr(),
    in("ecx") xs.len(),
);
assert_eq!(sum, 44);

Whenever the lateout shadows either in this causes out-of-bounds accesses.

Copy link

@roblabla roblabla Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I'd say that the motivation for explicit declaration of clobbered registers is that assembly may not access (read nor write) any register not declared in its inputs and outputs.

This is wrong though. For instance, the following piece of inline asm does not clobber eax, despite accessing it:

asm!("push eax; mov eax, 4; pop eax");

Copy link
Contributor

@gnzlbg gnzlbg Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is out-of-bounds access within an asm block even considered undefined behavior

No, this is not undefined behavior, e.g.,

// allocate 8-bytes aligned to a 16-byte boundary:
let ptr: *mut f32 = GlobalAlloc::alloc(Layout::new(8, 16)?) as *mut _;
assert!(!ptr.is_null());
let mut arr = [MaybeUninit<f32>; 4];
// read 16-bytes from that pointer, the later 8-bytes are a read out-of-bounds:
asm!("
        movntdqa xmm0, {0};
        movaps {1}, xmm0;
    ", 
    in(ptr), out(&arr));

What's UB is an asm! block that has a side-effect but does not state it, e.g., an asm! block that modifies a register, but does not mark it as being an output or a clobber, since the compiler will generate code under the assumption that this register is not modified, yet that assumption would be incorrect.

Copy link
Contributor

@gnzlbg gnzlbg Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RalfJung

For example, I'd say that the motivation for explicit declaration of clobbered registers is that assembly may not access (read nor write) any register not declared in its inputs and outputs. So, the cpuid example without the lateout("eax") _, lateout("edx") _ is UB due to violating those constraints.

I think this goes in the right direction. I'll reword this as "An assembly block shall only have specified [0] side-effects, i.e., if an assembly block has a side-effect that's not specified, the behavior is undefined".

[0] "specified" as in "stated" within the assembly block.

This subtly different definition means that an assembly block can actually read from almost any register, including those that are not specified as inputs (e.g. like flags registers), because doing so is not a side-effect. I say almost any, because reading from some registers can be a side-effect (i.e. can modify the value of that or other registers).

The same is true for writes, e.g., on RISC-V, the x0 register is always zero, so you can write to it even if it is not declared as an output or a clobber, and doing so is ok because that is not a side-effect.

Then there are other side-effects that are part of the in and out parameters, e.g., my previous comment in this sub-thread contains an example of using an asm!("", in(ptr)); block to write into an array. Notice that the array memory is reachable inside the asm block through the ptr, but nowhere does the asm block explicitly state that it is going to write to that memory as a side-effect - it is implicit because we don't have the pure or nomem constraints.

What I think might be a bit confusing is the split between the declaration of "additive" and "negative" side-effects in the asm! block . For example, preserves_flags is a "negative" side-effect, stating that this assembly block does not modify (some) flag registers as a side-effect. However, lateout(foo) in the "clobbers" section states the additive side-effect that this asm! block might modify the content of the memory at foo.

In particular, this RFC provides rationale for assuming that asm! blocks can do anything by default (safe), and requiring users to opt-out of side-effects (unsafe), e.g., by writing pure, yet this RFC does assumes that assembly blocks don't clobber anything (unsafe) requiring users to opt-in to the "safe" thing. This difference in "safe" vs "unsafe" defaults feels a bit inconsistent, but from an usability point-of-view, having to specify all state that an asm! block does not clobber feels like a nightmare, so it makes sense to me to have this split.

Copy link
Member

@RalfJung RalfJung Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way too strong. In x86, the NOP instruction literally means "swap the contents of the eax register with the contents of eax", but it would be silly to require any assembly block with a nop in it to mark eax as a clobber. In general reading registers is always harmless and writing them only matters if their value isn't restored later, so it seems like it should be enough to say that it is undefined behavior to modify (and not subsequently restore) a register unless it is listed in an output/clobber.

Thanks for adding that, I was not aware of this.

@gnzlbg's definition makes sense to me.

This subtly different definition means that an assembly block can actually read from almost any register, including those that are not specified as inputs (e.g. like flags registers), because doing so is not a side-effect. I say almost any, because reading from some registers can be a side-effect (i.e. can modify the value of that or other registers).

It probably also goes without saying that there are no guarantees whatsoever for the value that is being read from such registers.

("Read" is a kind of effect though, in the sense that entirely pure code ["mathematical functions"] must not even read from mutable things. It's just a harmless effect in many cases. I am not sure how much bearing this fact has on the discussion at hand, but I get a bit worried every time someone claims that reads don't have side-effects.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

("Read" is a kind of effect though, in the sense that entirely pure code ["mathematical functions"] must not even read from mutable things. It's just a harmless effect in many cases. I am not sure how much bearing this fact has on the discussion at hand, but I get a bit worried every time someone claims that reads don't have side-effects.)

I think this is a software (compiler) vs hardware thing. If you read in source code, the compiler has to make sure there's a value there for you to read, which affects the code it generates, meaning there were side-effects to that read - and if the thing you were reading is mutable, this might visibly change what's written there (conversely, if you're a pure function then it might not make sure that you have something to read and then UB). At the hardware level, there's always something to read at that register, even if you can't guarantee what it is, so no extra effort needs to be made. A pure Rust function could have inline asm with the pure marker, and reading from a random register should still be no problem (although, of course, odd). Maybe otherwise put, in the "memory-model" of the cpu-registers, read really isn't an effect (except for the special registers where it is). For actual physical memory, that's probably more delicate.


This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code.
This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:

```rust
// Multiply x by 6 using shifts and adds
let mut x = 4;
unsafe {
asm!("
mov {tmp}, {x}
shl {tmp}, 1
shl {x}, 2
add {x}, {tmp}
", x = inout(reg) x, tmp = out(reg) _);
}
assert_eq!(x, 4 * 6);
```

## Register template modifiers

Expand Down Expand Up @@ -402,7 +418,7 @@ Here is the list of currently supported register classes:

> Notes on allowed types:
> - Pointers and references are allowed where the equivalent integer type is allowed.
> - `iLEN` refers to both sized and unsized integer types. It also implicitly includes `isize` and `usize` where the length matches.
> - `iLEN` refers to both sized and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.
> - Fat pointers are not allowed.
> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.

Expand All @@ -421,6 +437,7 @@ Some registers have multiple names. These are all treated by the compiler as ide
| x86 | `bp` | `bpl`, `ebp`, `rbp` |
| x86 | `sp` | `spl`, `esp`, `rsp` |
| x86 | `ip` | `eip`, `rip` |
| x86 | `st(0)` | `st` |
| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |
| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |
| AArch64 | `x[0-30]` | `w[0-30]` |
Expand Down Expand Up @@ -464,6 +481,8 @@ Some registers cannot be used for input or output operands:
| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |
| x86 | `k0` | This is a constant zero register which can't be modified. |
| x86 | `ip` | This is the program counter, not a real register. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does ip have aliases if it is unusable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that the compiler can provide better error messages ("unknown register" vs "disallowed register").

| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |
| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |
| AArch64 | `xzr` | This is a constant zero register which can't be modified. |
| ARM | `pc` | This is the program counter, not a real register. |
| RISC-V | `x0` | This is a constant zero register which can't be modified. |
Expand Down