From 4126461a38a45908bd239db4d67c7b2eba04083c Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Tue, 29 Dec 2015 06:10:14 -0800 Subject: [PATCH 01/14] RFC: native C-compatible unions via `untagged_union` --- text/0000-untagged_union.md | 325 ++++++++++++++++++++++++++++++++++++ 1 file changed, 325 insertions(+) create mode 100644 text/0000-untagged_union.md diff --git a/text/0000-untagged_union.md b/text/0000-untagged_union.md new file mode 100644 index 00000000000..0c67104b3cf --- /dev/null +++ b/text/0000-untagged_union.md @@ -0,0 +1,325 @@ +- Feature Name: `untagged_union` +- Start Date: 2015-12-29 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Provide native support for C-compatible unions, defined via a new keyword +`untagged_union`. + +# Motivation +[motivation]: #motivation + +Many FFI interfaces include unions. Rust does not currently have any native +representation for unions, so users of these FFI interfaces must define +multiple structs and transmute between them via `std::mem::transmute`. The +resulting FFI code must carefully understand platform-specific size and +alignment requirements for structure fields. Such code has little in common +with how a C client would invoke the same interfaces. + +Introducing native syntax for unions makes many FFI interfaces much simpler and +less error-prone to write, simplifying the creation of bindings to native +libraries, and enriching the Rust/Cargo ecosystem. + +A native union mechanism would also simplify Rust implementations of +space-efficient or cache-efficient structures relying on value representation, +such as machine-word-sized unions using the least-significant bits of aligned +pointers to distinguish cases. + +The syntax proposed here avoids reserving `union` as the new keyword, as +existing Rust code already uses `union` for other purposes, including [multiple +functions in the standard +library](https://doc.rust-lang.org/std/?search=union). + +To preserve memory safety, accesses to union fields may only occur in `unsafe` +code. Commonly, code using unions will provide safe wrappers around unsafe +union field accesses. + +# Detailed design +[design]: #detailed-design + +## Declaring a union type + +A union declaration uses the same field declaration syntax as a `struct` +declaration, except with the keyword `untagged_union` in place of `struct`: + +```rust +untagged_union MyUnion { + f1: u32, + f2: f32, +} +``` + +`untagged_union` implies `#[repr(C)]` as the default representation, making +`#[repr(C)] untagged_union` permissible but redundant. + +## Instantiating a union + +A union instantiation uses the same syntax as a struct instantiation, except +that it must specify exactly one field: + +```rust +let u = MyUnion { f1: 1 }; +``` + +Specifying multiple fields in a union instantiation results in a compiler +error. + +Safe code may instantiate a union, as no unsafe behavior can occur until +accessing a field of the union. Code that wishes to maintain invariants about +the union fields should make the union fields private and provide public +functions that maintain the invariants. + +## Reading fields + +Unsafe code may read from union fields, using the same dotted syntax as a +struct: + +```rust +fn f(u: MyUnion) -> f32 { + unsafe { u.f2 } +} +``` + +## Writing fields + +Unsafe code may write to fields in a mutable union, using the same syntax as a +struct: + +```rust +fn f(u: &mut MyUnion) { + unsafe { + u.f1 = 2; + } +} +``` + +If a union contains multiple fields of different sizes, assigning to a field +smaller than the entire union must not change the memory of the union outside +that field. + +## Pattern matching + +Unsafe code may pattern match on union fields, using the same syntax as a +struct, without the requirement to mention every field of the union in a match +or use `..`: + +```rust +fn f(u: MyUnion) { + unsafe { + match u { + MyUnion { f1: 10 } => { println!("ten"); } + MyUnion { f2 } => { println!("{}", f2); } + } + } +} +``` + +Matching a specific value from a union field makes a refutable pattern; naming +a union field without matching a specific value makes an irrefutable pattern. +Both require unsafe code. + +Pattern matching may match a union as a field of a larger structure. In +particular, when using an `untagged_union` to implement a C tagged union via +FFI, this allows matching on the tag and the corresponding field +simultaneously: + +```rust +#[repr(u32)] +enum Tag { I, F } + +untagged_union U { + i: i32, + f: f32, +} + +#[repr(C)] +struct Value { + tag: Tag, + u: U, +} + +fn is_zero(v: Value) -> bool { + unsafe { + match v { + Value { tag: I, u: U { i: 0 } } => true, + Value { tag: F, u: U { f: 0.0 } } => true, + _ => false, + } + } +} +``` + +Note that a pattern match on a union field that has a smaller size than the +entire union must not make any assumptions about the value of the union's +memory outside that field. + +## Borrowing union fields + +Unsafe code may borrow a reference to a field of a union; doing so borrows the +entire union, such that any borrow conflicting with a borrow of the union +(including a borrow of another union field or a borrow of a structure +containing the union) will produce an error. + +```rust +untagged_union U { + f1: u32, + f2: f32, +} + +#[test] +fn test() { + let mut u = U { f1: 1 }; + unsafe { + let b1 = &mut u.f1; + // let b2 = &mut u.f2; // This would produce an error + *b1 = 5; + } + unsafe { + assert_eq!(u.f1, 5); + } +} +``` + +Simultaneous borrows of multiple fields of a struct contained within a union do +not conflict: + +```rust +struct S { + x: u32, + y: u32, +} + +untagged_union U { + s: S, + both: u64, +} + +#[test] +fn test() { + let mut u = U { s: S { x: 1, y: 2 } }; + unsafe { + let bx = &mut u.s.x; + // let bboth = &mut u.both; // This would fail + let by = &mut u.s.y; + *bx = 5; + *by = 10; + } + unsafe { + assert_eq!(u.s.x, 5); + assert_eq!(u.s.y, 10); + } +} +``` + +## Union and field visibility + +The `pub` keyword works on the union and on its fields, as with a struct. The +union and its fields default to private. Using a private field in a union +instantiation, field access, or pattern match produces an error. + +## Uninitialized unions + +The compiler should consider a union uninitialized if declared without an +initializer. However, providing a field during instantiation, or assigning to +a field, should cause the compiler to treat the entire union as initialized. + +## Unions and traits + +A union may have trait implementations, using the same syntax as a struct. + +The compiler should warn if a union field has a type that implements the `Drop` +trait. + +## Unions and undefined behavior + +Rust code must not use unions to invoke [undefined +behavior](https://doc.rust-lang.org/nightly/reference.html#behavior-considered-undefined). +In particular, Rust code must not use unions to break the pointer aliasing +rules with raw pointers, or access a field containing a primitive type with an +invalid value. + +## Union size and alignment + +A union must have the same size and alignment as an equivalent C union +declaration for the target platform. Typically, a union would have the maximum +size of any of its fields, and the maximum alignment of any of its fields. +Note that those maximums may come from different fields; for instance: + +```rust +untagged_union U { + f1: u16, + f2: [u8; 4], +} + +#[test] +fn test() { + assert_eq!(std::mem::size_of(), 4); + assert_eq!(std::mem::align_of(), 2); +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +Adding a new type of data structure would increase the complexity of the +language and the compiler implementation, albeit marginally. However, this +change seems likely to provide a net reduction in the quantity and complexity +of unsafe code. + +# Alternatives +[alternatives]: #alternatives + +- Don't do anything, and leave users of FFI interfaces with unions to continue + writing complex platform-specific transmute code. +- Create macros to define unions and access their fields. However, such macros + make field accesses and pattern matching look more cumbersome and less + structure-like. The implementation and use of such macros provides strong + motivation to seek a better solution, and indeed existing writers and users + of such macros have specifically requested native syntax in Rust. +- Define unions without a new keyword `untagged_union`, such as via + `#[repr(union)] struct`. This would avoid any possibility of breaking + existing code that uses the keyword, but would make declarations more + verbose, and introduce potential confusion with `struct` (or whatever + existing construct the `#[repr(union)]` attribute modifies). +- Use a compound keyword like `unsafe union`, while not reserving `union` on + its own as a keyword, to avoid breaking use of `union` as an identifier. + Potentially more appealing syntax, if the Rust parser can support it. +- Use a new operator to access union fields, rather than the same `.` operator + used for struct fields. This would make union fields more obvious at the + time of access, rather than making them look syntactically identical to + struct fields despite the semantic difference in storage representation. +- The [unsafe enum](https://github.com/rust-lang/rfcs/pull/724) proposal: + introduce untagged enums, identified with `unsafe enum`. Pattern-matching + syntax would make field accesses significantly more verbose than structure + field syntax. +- The [unsafe enum](https://github.com/rust-lang/rfcs/pull/724) proposal with + the addition of struct-like field access syntax. The resulting field access + syntax would look much like this proposal; however, pairing an enum-style + definition with struct-style usage seems confusing for developers. An + enum-based declaration leads users to expect enum-like syntax; a new + construct distinct from both enum and struct does not lead to such + expectations, and developers used to C unions will expect struct-like field + access for unions. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Can the borrow checker support the rule that "simultaneous borrows of multiple +fields of a struct contained within a union do not conflict"? If not, omitting +that rule would only marginally increase the verbosity of such code, by +requiring an explicit borrow of the entire struct first. + +Can a pattern match match multiple fields of a union at once? For rationale, +consider a union using the low bits of an aligned pointer as a tag; a pattern +match may match the tag using one field and a value identified by that tag +using another field. However, if this complicates the implementation, omitting +it would not significantly complicate code using unions. + +C APIs using unions often also make use of anonymous unions and anonymous +structs. For instance, a union may contain anonymous structs to define +non-overlapping fields, and a struct may contain an anonymous union to define +overlapping fields. This RFC does not define anonymous unions or structs, but +a subsequent RFC may wish to do so. From b2d8ca06aeac74643f298714f084c7d064c9a36b Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 4 Jan 2016 23:42:12 -0800 Subject: [PATCH 02/14] Make union fields that implement Drop an error, not a warning --- text/0000-untagged_union.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-untagged_union.md b/text/0000-untagged_union.md index 0c67104b3cf..eff2838e12f 100644 --- a/text/0000-untagged_union.md +++ b/text/0000-untagged_union.md @@ -230,8 +230,8 @@ a field, should cause the compiler to treat the entire union as initialized. A union may have trait implementations, using the same syntax as a struct. -The compiler should warn if a union field has a type that implements the `Drop` -trait. +The compiler should produce an error if a union field has a type that +implements the `Drop` trait. ## Unions and undefined behavior From c826e6713cc7e25a8f86cb3736b6905671ae0ff3 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 18 Jan 2016 13:14:38 -0800 Subject: [PATCH 03/14] Elaborate on the interaction between unions and Drop --- text/0000-untagged_union.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/text/0000-untagged_union.md b/text/0000-untagged_union.md index eff2838e12f..d68a95ea3ec 100644 --- a/text/0000-untagged_union.md +++ b/text/0000-untagged_union.md @@ -230,8 +230,11 @@ a field, should cause the compiler to treat the entire union as initialized. A union may have trait implementations, using the same syntax as a struct. -The compiler should produce an error if a union field has a type that -implements the `Drop` trait. +The compiler should provide a lint if a union field has a type that implements +the `Drop` trait. The compiler may optionally provide a pragma to disable that +lint, for code that intentionally stores a type with Drop in a union. The +compiler must never implicitly generate a Drop implementation for the union +itself, though Rust code may explicitly implement Drop for a union type. ## Unions and undefined behavior From 2e5ef18b235dcbddbb19f2b513cd11afed0b0f02 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 18 Jan 2016 16:44:52 -0800 Subject: [PATCH 04/14] Change syntax to "union! U { ... }" This provides the clean syntax of a keyword, without breaking any existing code, and without attaching expectations based on the semantics or syntax of some existing keyword such as "struct" or "enum". --- .../{0000-untagged_union.md => 0000-union.md} | 48 ++++++++++--------- 1 file changed, 25 insertions(+), 23 deletions(-) rename text/{0000-untagged_union.md => 0000-union.md} (87%) diff --git a/text/0000-untagged_union.md b/text/0000-union.md similarity index 87% rename from text/0000-untagged_union.md rename to text/0000-union.md index d68a95ea3ec..5769022f3ff 100644 --- a/text/0000-untagged_union.md +++ b/text/0000-union.md @@ -1,4 +1,4 @@ -- Feature Name: `untagged_union` +- Feature Name: `union` - Start Date: 2015-12-29 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) @@ -6,8 +6,8 @@ # Summary [summary]: #summary -Provide native support for C-compatible unions, defined via a new keyword -`untagged_union`. +Provide native support for C-compatible unions, defined via a built-in syntax +macro `union!`. # Motivation [motivation]: #motivation @@ -28,10 +28,14 @@ space-efficient or cache-efficient structures relying on value representation, such as machine-word-sized unions using the least-significant bits of aligned pointers to distinguish cases. -The syntax proposed here avoids reserving `union` as the new keyword, as -existing Rust code already uses `union` for other purposes, including [multiple -functions in the standard -library](https://doc.rust-lang.org/std/?search=union). +The syntax proposed here avoids reserving a new keyword (such as `union`), and +thus will not break any existing code. This syntax also avoids adding a pragma +to some existing keyword that doesn't quite fit, such as `struct` or `enum`, +which avoids attaching any of the semantic significance of those keywords to +this new construct. Rust does not produce an error or warning about the +redefinition of a macro already defined in the standard library, so the +proposed syntax will not even break code that currently defines a macro named +`union!`. To preserve memory safety, accesses to union fields may only occur in `unsafe` code. Commonly, code using unions will provide safe wrappers around unsafe @@ -43,17 +47,16 @@ union field accesses. ## Declaring a union type A union declaration uses the same field declaration syntax as a `struct` -declaration, except with the keyword `untagged_union` in place of `struct`: +declaration, except with `union!` in place of `struct`. ```rust -untagged_union MyUnion { +union! MyUnion { f1: u32, f2: f32, } ``` -`untagged_union` implies `#[repr(C)]` as the default representation, making -`#[repr(C)] untagged_union` permissible but redundant. +`union!` implies `#[repr(C)]` as the default representation. ## Instantiating a union @@ -122,15 +125,14 @@ a union field without matching a specific value makes an irrefutable pattern. Both require unsafe code. Pattern matching may match a union as a field of a larger structure. In -particular, when using an `untagged_union` to implement a C tagged union via -FFI, this allows matching on the tag and the corresponding field -simultaneously: +particular, when using a Rust union to implement a C tagged union via FFI, this +allows matching on the tag and the corresponding field simultaneously: ```rust #[repr(u32)] enum Tag { I, F } -untagged_union U { +union! U { i: i32, f: f32, } @@ -164,7 +166,7 @@ entire union, such that any borrow conflicting with a borrow of the union containing the union) will produce an error. ```rust -untagged_union U { +union! U { f1: u32, f2: f32, } @@ -192,7 +194,7 @@ struct S { y: u32, } -untagged_union U { +union! U { s: S, both: u64, } @@ -252,7 +254,7 @@ size of any of its fields, and the maximum alignment of any of its fields. Note that those maximums may come from different fields; for instance: ```rust -untagged_union U { +union! U { f1: u16, f2: [u8; 4], } @@ -282,11 +284,11 @@ of unsafe code. structure-like. The implementation and use of such macros provides strong motivation to seek a better solution, and indeed existing writers and users of such macros have specifically requested native syntax in Rust. -- Define unions without a new keyword `untagged_union`, such as via - `#[repr(union)] struct`. This would avoid any possibility of breaking - existing code that uses the keyword, but would make declarations more - verbose, and introduce potential confusion with `struct` (or whatever - existing construct the `#[repr(union)]` attribute modifies). +- Define unions via a pragma modifying an existing keyword, such as via + `#[repr(union)] struct`. Like the macro approach, this avoids breaking + existing code via a new keyword. However, this would make declarations more + verbose and noisy, and would introduce potential confusion with `struct` (or + whatever existing construct the pragma modified). - Use a compound keyword like `unsafe union`, while not reserving `union` on its own as a keyword, to avoid breaking use of `union` as an identifier. Potentially more appealing syntax, if the Rust parser can support it. From 9b4b8af040156a5773e15240f36fbadc7400e396 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 18 Jan 2016 17:18:44 -0800 Subject: [PATCH 05/14] Rewrite alternatives as prose, and expand --- text/0000-union.md | 98 +++++++++++++++++++++++++++++++--------------- 1 file changed, 67 insertions(+), 31 deletions(-) diff --git a/text/0000-union.md b/text/0000-union.md index 5769022f3ff..c70beb1d38e 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -277,37 +277,73 @@ of unsafe code. # Alternatives [alternatives]: #alternatives -- Don't do anything, and leave users of FFI interfaces with unions to continue - writing complex platform-specific transmute code. -- Create macros to define unions and access their fields. However, such macros - make field accesses and pattern matching look more cumbersome and less - structure-like. The implementation and use of such macros provides strong - motivation to seek a better solution, and indeed existing writers and users - of such macros have specifically requested native syntax in Rust. -- Define unions via a pragma modifying an existing keyword, such as via - `#[repr(union)] struct`. Like the macro approach, this avoids breaking - existing code via a new keyword. However, this would make declarations more - verbose and noisy, and would introduce potential confusion with `struct` (or - whatever existing construct the pragma modified). -- Use a compound keyword like `unsafe union`, while not reserving `union` on - its own as a keyword, to avoid breaking use of `union` as an identifier. - Potentially more appealing syntax, if the Rust parser can support it. -- Use a new operator to access union fields, rather than the same `.` operator - used for struct fields. This would make union fields more obvious at the - time of access, rather than making them look syntactically identical to - struct fields despite the semantic difference in storage representation. -- The [unsafe enum](https://github.com/rust-lang/rfcs/pull/724) proposal: - introduce untagged enums, identified with `unsafe enum`. Pattern-matching - syntax would make field accesses significantly more verbose than structure - field syntax. -- The [unsafe enum](https://github.com/rust-lang/rfcs/pull/724) proposal with - the addition of struct-like field access syntax. The resulting field access - syntax would look much like this proposal; however, pairing an enum-style - definition with struct-style usage seems confusing for developers. An - enum-based declaration leads users to expect enum-like syntax; a new - construct distinct from both enum and struct does not lead to such - expectations, and developers used to C unions will expect struct-like field - access for unions. +This proposal has a substantial history, with many variants and alternatives +prior to the current macro-based syntax. Thanks to many people in the Rust +community for helping to refine this RFC. + +As an alternative to the macro syntax, Rust could support unions via a new +keyword instead. However, any introduction of a new keyword will necessarily +break some code that previously compiled, such as code using the keyword as an +identifier. Using `union` as the keyword would break the substantial volume of +existing Rust code using `union` for other purposes, including [multiple +functions in the standard +library](https://doc.rust-lang.org/std/?search=union). Another keyword such as +`untagged_union` would reduce the likelihood of breaking code in practice; +however, in the absence of an explicit policy for introducing new keywords, +this RFC opts to not propose a new keyword. + +To avoid breakage caused by a new reserved keyword, Rust could use a compound +keyword like `unsafe union` (currently not legal syntax in any context), while +not reserving `union` on its own as a keyword, to avoid breaking use of `union` +as an identifier. This provides equally reasonable syntax, but potentially +introduces more complexity in the Rust parser. + +In the absence of a new keyword, since unions represent unsafe, untagged sum +types, and enum represents safe, tagged sum types, Rust could base unions on +enum instead. The [unsafe enum](https://github.com/rust-lang/rfcs/pull/724) +proposal took this approach, introducing unsafe, untagged enums, identified +with `unsafe enum`; further discussion around that proposal led to the +suggestion of extending it with struct-like field access syntax. Such a +proposal would similarly eliminate explicit use of `std::mem::transmute`, and +avoid the need to handle platform-specific size and alignment requirements for +fields. + +The standard pattern-matching syntax of enums would make field accesses +significantly more verbose than struct-like syntax, and in particular would +typically require more code inside unsafe blocks. Adding struct-like field +access syntax would avoid that; however, pairing an enum-like definition with +struct-like usage seems confusing for developers. A declaration using `enum` +leads users to expect enum-like syntax; a new construct distinct from both +`enum` and `struct` avoids leading users to expect any particular syntax or +semantics. Furthermore, developers used to C unions will expect struct-like +field access for unions. + +Since this proposal uses struct-like syntax for declaration, initialization, +pattern matching, and field access, the original version of this RFC used a +pragma modifying the `struct` keyword: `#[repr(union)] struct`. However, while +the proposed unions match struct syntax, they do not share the semantics of +struct; most notably, unions represent a sum type, while structs represent a +product type. The new construct `union!` avoids the semantics attached to +existing keywords. + +In the absence of any native support for unions, developers of existing Rust +code have resorted to either complex platform-specific transmute code, or +complex union-definition macros. In the latter case, such macros make field +accesses and pattern matching look more cumbersome and less structure-like, and +still require detailed platform-specific knowledge of structure layout and +field sizes. The implementation and use of such macros provides strong +motivation to seek a better solution, and indeed existing writers and users of +such macros have specifically requested native syntax in Rust. + +Finally, to call more attention to reads and writes of union fields, field +access could use a new access operator, rather than the same `.` operator used +for struct fields. This would make union fields more obvious at the time of +access, rather than making them look syntactically identical to struct fields +despite the semantic difference in storage representation. However, this does +not seem worth the additional syntactic complexity and divergence from other +languages. Union field accesses already require unsafe blocks, which calls +attention to them. Calls to unsafe functions use the same syntax as calls to +safe functions. # Unresolved questions [unresolved]: #unresolved-questions From 17feb14875594aa8add679b29e0045829686d937 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 18 Jan 2016 17:19:18 -0800 Subject: [PATCH 06/14] Remove unnecessary backquotes. --- text/0000-union.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-union.md b/text/0000-union.md index c70beb1d38e..b3244f78b82 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -37,7 +37,7 @@ redefinition of a macro already defined in the standard library, so the proposed syntax will not even break code that currently defines a macro named `union!`. -To preserve memory safety, accesses to union fields may only occur in `unsafe` +To preserve memory safety, accesses to union fields may only occur in unsafe code. Commonly, code using unions will provide safe wrappers around unsafe union field accesses. @@ -46,7 +46,7 @@ union field accesses. ## Declaring a union type -A union declaration uses the same field declaration syntax as a `struct` +A union declaration uses the same field declaration syntax as a struct declaration, except with `union!` in place of `struct`. ```rust From c123950b0251a24f0a543932fa190ec9e8e46a21 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 18 Jan 2016 17:21:11 -0800 Subject: [PATCH 07/14] Add example about pattern match on union field with smaller size than union. --- text/0000-union.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/text/0000-union.md b/text/0000-union.md index b3244f78b82..4c0858e4634 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -156,7 +156,9 @@ fn is_zero(v: Value) -> bool { Note that a pattern match on a union field that has a smaller size than the entire union must not make any assumptions about the value of the union's -memory outside that field. +memory outside that field. For example, if a union contains a `u8` and a +`u32`, matching on the `u8` may not perform a `u32`-sized comparison over the +entire union. ## Borrowing union fields From 98a5809eec752ad48a042cddfa57d6dbd46e673c Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 18 Jan 2016 17:24:15 -0800 Subject: [PATCH 08/14] Reduce the size of unsafe blocks --- text/0000-union.md | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/text/0000-union.md b/text/0000-union.md index 4c0858e4634..00c7301db49 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -181,9 +181,7 @@ fn test() { // let b2 = &mut u.f2; // This would produce an error *b1 = 5; } - unsafe { - assert_eq!(u.f1, 5); - } + assert_eq!(unsafe { u.f1 }, 5); } ``` @@ -211,10 +209,8 @@ fn test() { *bx = 5; *by = 10; } - unsafe { - assert_eq!(u.s.x, 5); - assert_eq!(u.s.y, 10); - } + assert_eq!(unsafe { u.s.x }, 5); + assert_eq!(unsafe { u.s.y }, 10); } ``` From c60da1f9656294b09d06d6256ee2155bc738e44e Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sat, 26 Mar 2016 14:00:10 -0700 Subject: [PATCH 09/14] Switch to recognizing `union` as a contextual keyword --- text/0000-union.md | 79 +++++++++++++++++++++++++--------------------- 1 file changed, 43 insertions(+), 36 deletions(-) diff --git a/text/0000-union.md b/text/0000-union.md index 00c7301db49..21b413b1b9a 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -6,8 +6,9 @@ # Summary [summary]: #summary -Provide native support for C-compatible unions, defined via a built-in syntax -macro `union!`. +Provide native support for C-compatible unions, defined via a new "contextual +keyword" `union`, without breaking any existing code that uses `union` as an +identifier. # Motivation [motivation]: #motivation @@ -28,14 +29,11 @@ space-efficient or cache-efficient structures relying on value representation, such as machine-word-sized unions using the least-significant bits of aligned pointers to distinguish cases. -The syntax proposed here avoids reserving a new keyword (such as `union`), and -thus will not break any existing code. This syntax also avoids adding a pragma -to some existing keyword that doesn't quite fit, such as `struct` or `enum`, -which avoids attaching any of the semantic significance of those keywords to -this new construct. Rust does not produce an error or warning about the -redefinition of a macro already defined in the standard library, so the -proposed syntax will not even break code that currently defines a macro named -`union!`. +The syntax proposed here recognizes `union` as though it were a keyword when +used to introduce a union declaration, *without* breaking any existing code +that uses `union` as an identifier. Experiments by Niko Matsakis demonstrate +that recognizing `union` in this manner works unambiguously with zero conflicts +in the Rust grammar. To preserve memory safety, accesses to union fields may only occur in unsafe code. Commonly, code using unions will provide safe wrappers around unsafe @@ -47,16 +45,25 @@ union field accesses. ## Declaring a union type A union declaration uses the same field declaration syntax as a struct -declaration, except with `union!` in place of `struct`. +declaration, except with `union` in place of `struct`. ```rust -union! MyUnion { +union MyUnion { f1: u32, f2: f32, } ``` -`union!` implies `#[repr(C)]` as the default representation. +`union` implies `#[repr(C)]` as the default representation. + +## Contextual keyword + +Rust normally prevents the use of a keyword as an identifier; for instance, a +declaration `fn struct() {}` will produce an error "expected identifier, found +keyword `struct`". However, to avoid breaking existing declarations that use +`union` as an identifier, Rust will only recognize `union` as a keyword when +used to introduce a union declaration. A declaration `fn union() {}` will not +produce such an error. ## Instantiating a union @@ -132,7 +139,7 @@ allows matching on the tag and the corresponding field simultaneously: #[repr(u32)] enum Tag { I, F } -union! U { +union U { i: i32, f: f32, } @@ -168,7 +175,7 @@ entire union, such that any borrow conflicting with a borrow of the union containing the union) will produce an error. ```rust -union! U { +union U { f1: u32, f2: f32, } @@ -194,7 +201,7 @@ struct S { y: u32, } -union! U { +union U { s: S, both: u64, } @@ -252,7 +259,7 @@ size of any of its fields, and the maximum alignment of any of its fields. Note that those maximums may come from different fields; for instance: ```rust -union! U { +union U { f1: u16, f2: [u8; 4], } @@ -275,26 +282,26 @@ of unsafe code. # Alternatives [alternatives]: #alternatives -This proposal has a substantial history, with many variants and alternatives -prior to the current macro-based syntax. Thanks to many people in the Rust -community for helping to refine this RFC. +Proposals for unions in Rust have a substantial history, with many variants and +alternatives prior to the syntax proposed here with a `union` pseudo-keyword. +Thanks to many people in the Rust community for helping to refine this RFC. -As an alternative to the macro syntax, Rust could support unions via a new -keyword instead. However, any introduction of a new keyword will necessarily +The most obvious path to introducing unions in Rust would introduce `union` as +a new keyword. However, any introduction of a new keyword will necessarily break some code that previously compiled, such as code using the keyword as an -identifier. Using `union` as the keyword would break the substantial volume of -existing Rust code using `union` for other purposes, including [multiple -functions in the standard -library](https://doc.rust-lang.org/std/?search=union). Another keyword such as -`untagged_union` would reduce the likelihood of breaking code in practice; -however, in the absence of an explicit policy for introducing new keywords, -this RFC opts to not propose a new keyword. - -To avoid breakage caused by a new reserved keyword, Rust could use a compound -keyword like `unsafe union` (currently not legal syntax in any context), while -not reserving `union` on its own as a keyword, to avoid breaking use of `union` -as an identifier. This provides equally reasonable syntax, but potentially -introduces more complexity in the Rust parser. +identifier. Making `union` a keyword in the standard way would break the +substantial volume of existing Rust code using `union` for other purposes, +including [multiple functions in the standard +library](https://doc.rust-lang.org/std/?search=union). The approach proposed +here, recognizing `union` to introduce a union declaration without prohibiting +`union` as an identifier, provides the most natural declaration syntax and +avoids breaking any existing code. + +Proposals for unions in Rust have extensively explored possible variations on +declaration syntax, including longer keywords (`untagged_union`), built-in +syntax macros (`union!`), compound keywords (`unsafe union`), pragmas +(`#[repr(union)] struct`), and combinations of existing keywords (`unsafe +enum`). In the absence of a new keyword, since unions represent unsafe, untagged sum types, and enum represents safe, tagged sum types, Rust could base unions on @@ -321,7 +328,7 @@ pattern matching, and field access, the original version of this RFC used a pragma modifying the `struct` keyword: `#[repr(union)] struct`. However, while the proposed unions match struct syntax, they do not share the semantics of struct; most notably, unions represent a sum type, while structs represent a -product type. The new construct `union!` avoids the semantics attached to +product type. The new construct `union` avoids the semantics attached to existing keywords. In the absence of any native support for unions, developers of existing Rust From 80265328e86aaae96d2dca96c6090ce6b1a54098 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sat, 2 Apr 2016 15:46:17 -0700 Subject: [PATCH 10/14] Make unions that want C layout use #[repr(C)] explicitly --- text/0000-union.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/text/0000-union.md b/text/0000-union.md index 21b413b1b9a..ab7ef1b117b 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -54,7 +54,8 @@ union MyUnion { } ``` -`union` implies `#[repr(C)]` as the default representation. +By default, a union uses an unspecified binary layout. A union declared with +the `#[repr(C)]` attribute will have the same layout as an equivalent C union. ## Contextual keyword @@ -139,6 +140,7 @@ allows matching on the tag and the corresponding field simultaneously: #[repr(u32)] enum Tag { I, F } +#[repr(C)] union U { i: i32, f: f32, @@ -253,12 +255,14 @@ invalid value. ## Union size and alignment -A union must have the same size and alignment as an equivalent C union -declaration for the target platform. Typically, a union would have the maximum -size of any of its fields, and the maximum alignment of any of its fields. -Note that those maximums may come from different fields; for instance: +A union declared with `#[repr(C)]` must have the same size and alignment as an +equivalent C union declaration for the target platform. Typically, a union +would have the maximum size of any of its fields, and the maximum alignment of +any of its fields. Note that those maximums may come from different fields; +for instance: ```rust +#[repr(C)] union U { f1: u16, f2: [u8; 4], From 7f40a6bd5858c035b859f0003ae2eed37f744905 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sat, 2 Apr 2016 15:49:48 -0700 Subject: [PATCH 11/14] Mention impl syntax explicitly --- text/0000-union.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/text/0000-union.md b/text/0000-union.md index ab7ef1b117b..fdffc2558be 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -237,7 +237,8 @@ a field, should cause the compiler to treat the entire union as initialized. ## Unions and traits -A union may have trait implementations, using the same syntax as a struct. +A union may have trait implementations, using the same `impl` syntax as a +struct. The compiler should provide a lint if a union field has a type that implements the `Drop` trait. The compiler may optionally provide a pragma to disable that From b2e030930356b31d2d9439f9e9cfde4b4b88fc84 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sat, 2 Apr 2016 16:07:41 -0700 Subject: [PATCH 12/14] Discuss generic union --- text/0000-union.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/text/0000-union.md b/text/0000-union.md index fdffc2558be..10da05a9146 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -246,6 +246,17 @@ lint, for code that intentionally stores a type with Drop in a union. The compiler must never implicitly generate a Drop implementation for the union itself, though Rust code may explicitly implement Drop for a union type. +## Generic unions + +A union may have a generic type, with one or more type parameters or lifetime +parameters. As with a generic enum, the types within the union must make use +of all the parameters; however, not all fields within the union must use all +parameters. + +Type inference works on generic union types. In some cases, the compiler may +not have enough information to infer the parameters of a generic type, and may +require explicitly specifying them. + ## Unions and undefined behavior Rust code must not use unions to invoke [undefined From 3f456838f01fce474b27309bb29ce3653f25d081 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sat, 2 Apr 2016 16:08:46 -0700 Subject: [PATCH 13/14] Prohibit empty union declarations --- text/0000-union.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/0000-union.md b/text/0000-union.md index 10da05a9146..1c264b3fb8d 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -57,6 +57,9 @@ union MyUnion { By default, a union uses an unspecified binary layout. A union declared with the `#[repr(C)]` attribute will have the same layout as an equivalent C union. +A union must have at least one field; an empty union declaration produces a +syntax error. + ## Contextual keyword Rust normally prevents the use of a keyword as an identifier; for instance, a From 7727fb38171a15e9cedcb529ec749c605906a1e9 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Tue, 5 Apr 2016 22:17:54 -0700 Subject: [PATCH 14/14] Document limitations on unions declared without #[repr(C)] --- text/0000-union.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/text/0000-union.md b/text/0000-union.md index 1c264b3fb8d..1d501872c8e 100644 --- a/text/0000-union.md +++ b/text/0000-union.md @@ -268,6 +268,11 @@ In particular, Rust code must not use unions to break the pointer aliasing rules with raw pointers, or access a field containing a primitive type with an invalid value. +In addition, since a union declared without `#[repr(C)]` uses an unspecified +binary layout, code reading fields of such a union or pattern-matching such a +union must not read from a field other than the one written to. This includes +pattern-matching a specific value in a union field. + ## Union size and alignment A union declared with `#[repr(C)]` must have the same size and alignment as an