-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC for an operator to take a raw reference #2582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 20 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
fd4b4cd
RFC for an operator to take a raw reference
RalfJung e250bed
there is some design space in terms of when and how we emit the new o…
RalfJung af78d46
more examples by ubsan
RalfJung 1a62d82
coercions to raw ptrs also trigger the new operation
RalfJung 879fa5c
make a lint part of this RFC, clarify when we use a raw ref
RalfJung 0b496ee
expand description: this new operation entirely replaces ref-to-raw c…
RalfJung 0b9df7a
dereferencability
RalfJung 61c2e82
more on dereferencability
RalfJung 5f7e9ea
RFC 2582: Fix typo (must not -> need not)
haslersn 96ddb7c
Merge pull request #1 from haslersn/raw-reference-operator
RalfJung 35b7810
update summary and be more explicit
RalfJung 9a853d5
make example more concrete
RalfJung 6cdac4d
rename to raw_reference_mir_operator
RalfJung 4b60fb1
interaction with auto-deref
RalfJung 315bb31
propose we might have dedicated syntax
RalfJung cb2dd0f
mention more drawbacks, unresolved questions and future possibilities
RalfJung 1f1e690
half-rewrite RFC to center around a proposed syntax for raw references
RalfJung 8eea0bf
clarify
RalfJung 24aad9e
mention copying around
RalfJung 9889619
adjust lint level open question
RalfJung 6a803f1
move SUGAR to its own subsection under future possibilities
RalfJung 92042f6
editing, and moving some other things to future possibilities
RalfJung e95df5c
no more ref-to-ptr casts in MIR
RalfJung 9942dad
RFC 2582
Centril File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
- Feature Name: raw_reference_mir_operator | ||
- Start Date: 2018-11-01 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Introduce new variants of the `&` operator: `&raw mut <place>` to create a `*mut <T>`, and `&raw const <place>` to create a `*const <T>`. | ||
This creates a raw pointer directly, as opposed to the already existing `&mut <place> as *mut _`/`&<place> as *const _`, which create a temporary reference and then cast that to a raw pointer. | ||
As a consequence, the existing expressions `<term> as *mut <T>` and `<term> as *const <T>` where `<term>` has reference type are equivalent to `&raw mut *<term>` and `&raw const *<term>`, respectively. | ||
Moreover, add a lint to existing code that could use the new operator, and treat existing code that creates a reference and immediately casts or coerces it to a raw pointer as if it had been written with the new syntax. | ||
|
||
As an option (referred to as [SUGAR] below), we could treat `&mut <place> as *mut _`/`&<place> as *const _` as if they had been written with `&raw` to avoid creating temporary references when that was likely not the intention. | ||
RalfJung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Currently, if one wants to create a raw pointer pointing to something, one has no choice but to create a reference and immediately cast it to a raw pointer. | ||
The problem with this is that there are some invariants that we want to attach to references, that have to *always hold*. | ||
The details of this are not finally decided yet, but true in practice because of annotations we emit to LLVM. | ||
It is also the next topic of discussion in the [Unsafe Code Guidelines](https://github.com/rust-rfcs/unsafe-code-guidelines/). | ||
In particular, references must be aligned and dereferencable, even when they are created and never used. | ||
|
||
One consequence of these rules is that it becomes essentially impossible to create a raw pointer pointing to an unaligned struct field: | ||
`&packed.field as *const _` creates an intermediate unaligned reference, triggering undefined behavior because it is not aligned. | ||
Instead, code currently has to copy values around to aligned locations if pointers need to be passed, e.g., to FFI, as in: | ||
|
||
```rust | ||
#[derive(Default)] struct A(u8, i32); | ||
|
||
let mut a: A = Default::default(); | ||
let mut local = a.1; // copy struct field to stack | ||
unsafe { ffi_mod(&mut local as *mut _) }; // pass pointer to local to FFI | ||
a.1 = local; // copy local to struct back | ||
``` | ||
|
||
If one wants to avoid creating a reference to uninitialized data (which might or might not become part of the invariant that must be always upheld), it is also currently not possible to create a raw pointer to a field of an uninitialized struct: | ||
again, `&mut uninit.field as *mut _` would create an intermediate reference to uninitialized data. | ||
|
||
Another issue people sometimes run into is computing the address/offset of a field without asserting that there is any memory allocated there. | ||
This actually has two problems; first of all creating a reference asserts that the memory it points to is allocated, and secondly the offset computation is performed using `getelementptr inbounds`, meaning that the result of the computation is `poison` if it is not in-bounds of the allocation it started in. | ||
This RFC just solves the first problem, but it also provides an avenue for the second (see "Future possibilities"). | ||
|
||
To avoid making too many assumptions by creating a reference, this RFC proposes to introduce a new primitive operation that directly creates a raw pointer to a given place. | ||
No intermediate reference exists, so no invariants have to be adhered to: the pointer may be unaligned and/or dangling. | ||
We also add a lint for cases that seem like the programmer unnecessarily created an intermediate reference, suggesting they reduce the assumptions their code is making by creating a raw pointer instead. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
When working with unaligned or potentially dangling pointers, it is crucial that you always use raw pointers and not references: | ||
references come with guarantees that the compiler assumes are always upheld, and these guarantees include proper alignment and not being dangling. | ||
Importantly, these guarantees must be maintained even when the reference is created and never used! | ||
The following is UB: | ||
|
||
```rust | ||
#[repr(packed)] | ||
struct Packed { | ||
pad: u8, | ||
field: u16, | ||
} | ||
let packed = Packed { pad: 0, field: 0 }; | ||
let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior | ||
``` | ||
|
||
There is no situation in which the above code is correct, and hence it is a hard error to write this (after a transition period). | ||
This applies to most ways of creating a reference, i.e., all of the following are UB if `X` is not properly aligned and dereferencable: | ||
|
||
```rust | ||
fn foo() -> &T { | ||
&X | ||
} | ||
|
||
fn bar(x: &T) {} | ||
bar(&X); // this is UB at the call site, not in `bar` | ||
|
||
let &x = &X; // this is actually dereferencing the pointer, certainly UB | ||
let _ = &X; // throwing away the value immediately changes nothing | ||
&X; // different syntax for the same thing | ||
|
||
let x = &X as *const T; // this is casting to raw but "too late", an intermediate reference has been created (only if we do no adapt [SUGAR]) | ||
let x = &X as &T as *const T; // this is casting to raw but "too late" even if we adapt [SUGAR] | ||
``` | ||
|
||
The only way to create a pointer to an unaligned or dangling location without triggering undefined behavior is to use `&raw`, which creates a raw pointer without an intermediate reference. | ||
The following is valid: | ||
|
||
```rust | ||
let packed_cast = &raw const packed.field; | ||
``` | ||
|
||
As an optional extension ([SUGAR]) to keep existing code working and to provide a way for projects to adjust to these rules before the syntax bikeshed is finished, and to do so in a way that they do not have to drop support for old Rust versions, we could also treat all of the following as if they had been written using `&raw const` instead of `&`: | ||
|
||
```rust | ||
let packed_cast = unsafe { &raw const packed.field as *const _ }; | ||
let packed_coercion: *const _ = unsafe { &packed.field }; | ||
let null_cast: *const _ = unsafe { &*ptr::null() } as *const _; | ||
let null_coercion: *const _ = unsafe { &*ptr::null() }; | ||
``` | ||
|
||
The intention is to cover all cases where a reference, just created, is immediately explicitly used as a value of raw pointer type. | ||
|
||
Notice that this only applies if no automatic call to `deref` or `deref_mut` got inserted: | ||
those are regular function calls taking a reference, so in that case a reference is created and it must satisfy the usual guarantees. | ||
|
||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
Rust contains two operators that perform place-to-rvalue conversion (matching `&` in C): one to create a reference (with some given mutability) and one to create a raw pointer (with some given mutability). | ||
In the MIR, this is reflected as either a distinct `Rvalue` or a flag on the existing `Ref` variant. | ||
The borrow checker should do the usual checks on the place used in `&raw`, but can just ignore the result of this operation and the newly created "reference" can have any lifetime. | ||
When translating MIR to LLVM, nothing special has to happen as references and raw pointers have the same LLVM type anyway; the new operation behaves like `Ref`. | ||
When interpreting MIR in the Miri engine, the engine will know not to enforce any invariants on the raw pointer created by `&raw`. | ||
|
||
For the [SUGAR] option, when translating HIR to MIR, we recognize `&[mut] <place> as *[mut|const] ?T` (where `?T` can be any type, also a partial one like `_`) as well as coercions from `&[mut] <place>` to a raw pointer type as a special pattern and treat them as if they would have been written `&raw [mut|const] <place>`. | ||
We do this *after* auto-deref, meaning this pattern does not apply when a call to `deref` or `deref_mut` got inserted. | ||
Redundant parentheses are ignored, but block expressions are not: | ||
`{ &[mut] <place> }` materializes a reference that must be valid, no matter which coercions or casts follow outside the block. | ||
|
||
When doing unsafety checking, we make references to packed fields that do *not* use this new "raw reference" operation a *hard error even in unsafe blocks* (after a transition period). | ||
There is no situation in which this code is okay; it creates a reference that violates basic invariants. | ||
Taking a raw reference to a packed field, on the other hand, is a safe operation as the raw pointer comes with no special promises. | ||
"Unsafety checking" is thus not even a good term for this, maybe it should be a special pass dedicated to packed fields traversing MIR, or this can happen when lowering HIR to MIR. | ||
This check has nothing to do with whether we are in an unsafe block or not. | ||
|
||
Moreover, to prevent programmers from accidentally creating a safe reference when they did not want to, we add a lint that identifies situations where the programmer likely wants a raw reference, and suggest an explicit cast in that case. | ||
One possible heuristic here would be: If a safe reference (shared or mutable) is only ever used to create raw pointers, then likely it could be a raw pointer to begin with. | ||
The details of this are best worked out in the implementation phase of this RFC. | ||
The lint should, at the very least, fire for the cases covered by [SUGAR] if we do *not* adopt that option, and it should fire when the factor that prevents this matching [SUGAR] is just a redundant block, such as `{ &mut <place> } as *mut ?T`. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
If we adapt [SUGAR], it might be surprising that the following two pieces of code are not equivalent: | ||
|
||
```rust | ||
// Variant 1 | ||
let x = unsafe { &packed.field }; // Undefined behavior! | ||
let x = x as *const _; | ||
// Variant 2 | ||
let x = unsafe { &packed.field as *const _ }; | ||
``` | ||
|
||
Notice, however, that the lint should fire in variant 1. | ||
|
||
If `as` ever becomes an operation that can be overloaded, the behavior of `&packed.field as *const _` with [SUGAR] can *not* be obtained by dispatching to the overloaded `as` operator. | ||
Calling that method would assert validity of the reference. | ||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
[SUGAR] is a compromise to keep the analysis that affects code generation simple. | ||
Detecting "Variant 1" (from the "Drawbacks" section) would need a much less local analysis. | ||
Hence the proposal to make them not equivalent. | ||
|
||
A drawback of not adapting [SUGAR] is that we will have to wait longer (namely, until stabilization of the new syntax) until people can finally write UB-free versions of code that handles dangling or unaligned raw pointers. | ||
|
||
One alternative to introducing a new primitive operation might be to somehow exempt "references immediately cast to a raw pointer" from the invariant. | ||
(Basically, a "dynamic" version of the static analysis performed by the lint.) | ||
However, I believe that the semantics of a MIR program, including whether it as undefined behavior, should be deducible by executing it one step at a time. | ||
Given that, it is unclear how a semantics that "lazily" checks references should work, and how it could be compatible with the annotations we emit for LLVM. | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
I am not aware of another language with both comparatively strong invariants for its reference types, and raw pointers. | ||
The need for taking a raw reference only arise because of Rust having both of these features. | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
With [SUGAR], should the lint apply to cases that are covered by the special desugaring or not? | ||
Also, if not, should the lint become `deny` eventually (maybe only on some editions)? | ||
(Without [SUGAR], the lint clearly must apply to `&mut <place> as *mut _`/`&<place> as *const _`, and that pattern is common enough that the cost of `deny` is too high.) | ||
|
||
The interaction with auto-deref is a bit unfortunate. | ||
Maybe the lint should also cover cases that look like `&[mut] <place> as *[mut|const] ?T` in the surface syntax but had a method call inserted, thus manifesting a reference (with the associated guarantees). | ||
The lint as described would not fire because the reference actually gets used as such (being passed to `deref`). | ||
However, what would the lint suggest to do instead? | ||
There just is no way to write this code without creating a reference. | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
With [SUGAR], if Rust's type ascriptions end up performing coercions, those coercions should trigger the raw reference operator just like other coercions do. | ||
So `&packed.field: *const _` would be `&raw const packed.field`. | ||
If Rust ever gets type ascriptions with coercions for binders, likewise these coercions would be subject to these rules in cases like `match &packed.field { x: *const _ => x }`. | ||
|
||
It has been suggested to [remove `static mut`][static-mut] because it is too easy to accidentally create references with lifetime `'static`. | ||
With `&raw` we could instead restrict `static mut` to only allow taking raw pointers (`&raw [mut|const] STATIC`) and entirely disallow creating references (`&[mut] STATIC`) even in safe code (in a future edition, likely; with lints in older editions). | ||
|
||
As mentioned above, expressions such as `&raw mut x.field` still trigger more UB than might be expected---as witnessed by a [couple of attempts found in the wild of people implementing `offsetof`][offset-of] with something like: | ||
|
||
```rust | ||
let x: *mut Struct = NonNull::dangling().as_ptr(); | ||
let field: *mut Field = &mut x.field; | ||
``` | ||
|
||
The lint as described in this RFC would nudge people to instead write | ||
|
||
```rust | ||
let x: *mut Struct = NonNull::dangling().as_ptr(); | ||
let field: *mut Field = &raw mut x.field; | ||
``` | ||
|
||
which is better, but still UB: we emit a `getelementptr inbounds` for the `.field` offset computation. | ||
It might be a good idea to just not do that -- we know that references are fine, but we could decide that when raw pointers are involved that might be dangling, we do not want to assert anything from just the fact that an offset is being computed. | ||
A plain `getelementptr` still provides some information to the alias analysis, and if we derive the `inbounds` entirely from the fact that the pointer is created from or turned into a reference, we would be able to not have any clause in the language semantics that calls out the offset computation as potentially triggering UB | ||
If people just hear "`&raw` means my pointer can be dangling" they might think the second version above is actually okay, forgetting that the field access itself has its own subtle rule; getting rid of that rule would remove one foot-gun for unsafe code authors to worry about. | ||
|
||
[static-mut]: https://github.com/rust-lang/rust/issues/53639 | ||
[offset-of]: https://github.com/rust-lang/rfcs/pull/2582#issuecomment-467629986 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.