From e62830c989d193f8260885badd6bdd014a6b42fa Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 13:25:24 +0100 Subject: [PATCH 01/49] Niches --- text/0000-niche.md | 322 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 322 insertions(+) create mode 100644 text/0000-niche.md diff --git a/text/0000-niche.md b/text/0000-niche.md new file mode 100644 index 00000000000..540ff2ee3d3 --- /dev/null +++ b/text/0000-niche.md @@ -0,0 +1,322 @@ +- Feature Name: `niche` +- Start Date: 2022-10-16 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Provide a stable attribute to define "niche" values of a type. The type cannot +store these values, allowing the compiler to use them to optimize the +representation of containing types such as `Option`. + +# Motivation +[motivation]: #motivation + +Rust makes extensive use of types like `Option`, and many programs benefit from +the efficient storage of such types. Many programs also interface with others +via FFI, via interfaces that provide data and a sentinel value (such as for +errors or missing data) within the same bits. + +The Rust compiler already provides support for this via "niche" optimizations, +and various types providing guarantees of such optimizations, including +references, `bool`, `char`, and the `NonZero` family of types. However, Rust +does not provide any stable means of defining new types with niches, reserving +this mechanism for the standard library. This puts pressure on the standard +library to provide additional families of types with niches, while preventing +the broader crate ecosystem from experimenting with such types. + +Past efforts to define a stable niche mechanism stalled out due to scope creep: +alignment niches, null-page niches, multiple niches, structures with multiple +fields, and many other valid but challenging ideas (documented in the "Future +possibilities" section). This RFC defines a *simple* mechanism for defining one +common type of niche, while leaving room for future extension. + +Defining a niche mechanism allows libraries to build arbitrary types containing +niches, and simplifies handling of space-efficient data structures in Rust +without manual bit-twiddling. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +When defining a structure containing exactly one field of a non-zero-sized type +(non-ZST), you can attach a `niche` attribute on it to declare a specific value +or range of values for that field as invalid. This promises the compiler that +you will never store those values in that field, which allows the compiler to +use those in-memory representations for different purposes, such as the +representation of `None` in a containing `Option`. + +```rust +use std::mem::size_of; + +#[niche(value = 42)] +struct MeaninglessNumber(u64); + +assert_eq!(size_of::(), 8); +assert_eq!(size_of::>(), 8); + +#[niche(range = 2..)] +struct Bit(u8); + +assert_eq!(size_of::(), 1); +assert_eq!(size_of::>>>(), 1); +``` + +Constructing a structure with a niche value, or writing to the non-ZST field of +such a structure, or obtaining a mutable reference to such a field, requires +`unsafe` code. Causing a type with a niche to contain an invalid value (whther +by construction, writing, or transmuting) results in undefined behavior. + +If a type `T` contains only a single niche value, `Option` (and other enums +isomorphic to it, with one variant containing `T` and one nullary variant) will +use that value to represent `None` (the nullary variant). If such a `T` is +additionally `repr(transparent)` or `repr(C)` or otherwise permitted in FFI, +`Option` will likewise be permitted in FFI, with the niche value mapping +bidirectionally to `None` across the FFI boundary. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +The attribute `#[niche]` may only appear on a struct declaration. The struct +must contain exactly one field of a non-zero-sized type (non-ZST). The struct +may contain zero or more ZST fields, such as `PhantomData`. + +Declaring a niche on any item other than a struct declaration results in an +error. + +Declaring multiple `niche` attributes on a single item, or multiple key-value +pairs within a single `niche` attribute, results in an error. + +Declaring a niche on a struct containing more or less than one non-zero-sized +field results in an error. + +Declaring a range niche with an empty range (e.g. `0..0`) results in a +warn-by-default lint. As with many lints, this lint should be automatically +suppressed for code expanded from a macro. + +Declaring a range niche with an invalid range (e.g. `5..0`) results in an +error. + +Declaring a niche using a negative value or a negative range endpoint results +in an error. The text of the error should suggest the appropriate unsigned +equivalent to use. The compiler may support this in the future. + +Declaring a range niche with an open start (`..3`) results in an error, for +forwards-compatibility with support for negative values. + +Declaring a niche using a non-literal value (e.g. `usize::MAX`) results in an +error. Constants can use compile-time evaluation, and compile-time evaluation +does not occur early enough for attributes such as niche declarations. + +If a type `T` contains multiple niche values (e.g. `#[niche(range = 8..16)]`), +the compiler does not define the representation of types containing `T`, except +that multiple instances of the same identical type (e.g. `Option` and +`Option`) will use an identical representation (permitting a round-trip +`transmute` of such a value via bytes). In particular, the compiler does not +commit to making use of all the invalid values of the niche, even if it +otherwise could have. + +If a type `T` contains niches and uses `repr(C)` or `repr(transparent)`, the +compiler guarantees to use the same storage size for the type as it would +without the niche, even if the niche might allow storing fewer bytes. If a type +`T` contains niches and uses the default (`Rust`) `repr`, the compiler may +choose to represent the type using fewer bytes if the niche would allow doing +so. For instance: + +```rust +#[niche(range = 4..)] +struct S { + field: u16, +} + +// `size_of::()` may return less than 2 +``` + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +We could allow defining *either* valid or invalid ranges. For instance, +`niche(invalid_range(0..=3))` or `niche(valid_range(4..))`. Different types +could use whichever of the two proved simpler for a given use case. However, in +addition to adding gratuitous complexity and requiring longer names +(`invalid_range` vs `range`), this would double the number of cases when +defining other kinds of niches in the future. For instance, a future syntax for +bit-pattern niches would need to provide both `valid` and `invalid` variants as +well. We could introduce another level of nesting to make this orthogonal, such +as `niche(invalid(range(...)))` and `niche(invalid(range(...)))`, but that +further increases complexity. + +Rather than defining the range of *invalid* values, the attribute could define +the range of *valid* values. Different types may find one or the other case +simpler. This RFC chooses to define the range of *invalid* values for three +reasons: +- As an arbitrary choice, because we need to pick one or the other (see above). +- The most common case will be a single invalid value, for which defining + invalid values results in simpler code. +- This mechanism commonly goes by the name `niche`, and `niche` also refers to + the invalid value. So, an attribute defining the niche of a type most + naturally refers to the invalid value. + +Note that the compiler already supports having a niche in the middle of a +type's possible values; internally, the compiler represents this by defining a +valid range that wraps around the type's possible values. For instance, +`#[niche(value = 42)]` gets represented internally in the compiler as a valid +range starting at 43 and ending at 41. + +We could define *only* single-value niches, not ranges. However, the compiler +already supports ranges internally, and the standard library already makes use +of multi-value ranges, so this seems like an artificial limitation. + +We could define only ranges, not single-value niches, and users could express +single-value niches via ranges, such as `0..=0`. However, that makes +single-value niches more verbose to define, and makes mistakes such as `0..0` +more likely. (This RFC suggests a lint to catch such cases, but the syntax +should still attempt to guide users away from that mistake.) + +We could guarantee more usage of niches than just a single value; however, this +would constrain the compiler in areas that still see active development. + +We could avoid guaranteeing the use of a single-value niche for `Option`; +however, this would eliminate one of the primary user goals for such niches. + +We could require types to opt into the guaranteed use of a niche, separately +from declaring a niche. This seems unnecessarily verbose, as well as limiting: +we can't yet provide a full guarantee of all *future* uses we may want to +guarantee, only of the limited single-value uses. + +We could implement niches using a lang-item type that uses const generics (e.g. +`Niche>`. This type would be useful +regardless, and we should likely provide it if we can. However, this RFC +advocates (eventually) building such a type on an underlying language-level +building block like `niche`, and providing the underlying building blocks to +the ecosystem as well. + +We could implement niches using a trait `Niche` implemented for a type, with +associated consts for invalid values. If we chose to do this in the future, the +`#[niche(...)]` attribute could become forward-compatible with this, by +generating the trait impl. + +# Prior art +[prior-art]: #prior-art + +The Rust compiler has supported niches for types like `Option` in various forms +since versions prior to Rust 1.0. In particular, Rust 1.0 already guaranteed +that `Option<&T>` has the same size as `&T`. Rust has had many additional +niche-related optimizations since then. + +The Rust compiler already supports user-defined niches via the unstable +attributes `rustc_layout_scalar_valid_range_start` and +`rustc_layout_scalar_valid_range_end`. + +Bit-twiddling tricks to store information compactly have seen widespread use +and innovation since computing antiquity. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +Does the compiler support niches on structs containing ZST fields such as +`PhantomData`? If it doesn't, then initially, having a limitation to only +structs containing a single field would be fine, and would not substantially +reduce the usefulness of stabilizing this feature. + +Could we support niches on generic types? For instance, could we support +declaring a niche of `0` on a generic structure with a single field? + +Could we support negative numbers in a niche attribute, at least for fields of +concrete primitive type? That would provide a much more friendly interface, but +would require the compiler to better understand the type and its size. + +Will something go wrong if applying a niche to a struct whose non-ZST field is +itself a struct containing multiple fields? Do we need to restrict niches to +structs containing primitive types, or similar? + +Do we need to make `niche` mutually exclusive with `packed`? What about other +attributes? + +# Future possibilities +[future-possibilities]: #future-possibilities + +Niches offer possibilities as vast, rich, clever, and depraved as the +collective ingenuity of bit-twiddlers everywhere. This section includes many +possibilities that have come up in the past. This RFC deliberately excludes all +of these possibilities from the scope of the initial version, choosing to +specify only behavior that the Rust compiler already implements. + +New types of niches can use the same `niche` attribute, adding new key-values +within the attribute. + +- **Signed values**: This RFC requires the use of unsigned values when defining + niches. A future version could permit the use of signed values, to avoid + having to manually perform the twos-complement conversion. This may + require either making the compiler's implementation smarter, or using a + syntax that defines the size of the integer type (e.g. `-1isize`). +- **Limited constant evaluation**: This RFC excludes the possibility of using + constants in the range expression, because doing so simplifies the + implementation. Ideally, a future version would allow ranges to use at least + *simple* numeric constants, such as `usize::MAX`. Full constant evaluation + may be much harder to support. +- **Alignment niches**: If a pointer requires a certain alignment, any bit pattern + corresponding to an unaligned pointer could serve as a niche. This provides + an automatic mechanism for handling "tagged pointers" using the low bits. +- **Null-page niches**: If a target treats the entire null page as invalid, + pointers on that target could have a niche corresponding to that entire page, + rather than just the null value. This would allow defining niches spanning a + large swath of the value space. However, this would either require extensive + use of `cfg_attr` for various targets, or a new mechanism for obtaining the + valid range from the compiler. In addition, for some targets the valid range + may vary based on environment, even for the same target; in such cases, the + compiler would need to provide a mechanism for the user to supply the valid + range *to* the compiler. +- **Invalid-pointer niches**: On targets where certain pointer values cannot + represent a valid pointer in a given context (such as on x86-64 where the + upper half of the address space represents kernel-space address and the lower + half represents userspace addresses), types containing such pointers could use + a large swathe of values as a niche. +- **Pointer high-bit niches**: On targets that don't permit addresses with some of + the high bits set (such as implicitly on historical x86 or ARM platforms, or + explicitly defined via ARM's "top-byte ignore" or AMD's "upper address + ignore" or Intel's "Linear Address Masking"), types containing pointers could + potentially use values with those high bits set as a niche. This would likely + require compile-time configuration. +- **Multiple niches**: A type could define multiple niches, rather than just a + single range. +- **Other bit-pattern niches**: A type could define niches via a bit pattern, + rather than a range. +- **Per-field niches**: A structure containing multiple fields could have a + niche on a specific field, rather than the whole structure. +- **Whole-structure niches**: A structure containing multiple non-zero-sized + fields could have a niche of invalid values for the whole structure. +- **Union niches**: A union could have a niche. +- **Enum niches**: An enum or an enum variant could have a niche. +- **Specified mappings into niches**: Users may want to rely on mappings of + multiple values into a multi-value niche. For instance, users could define a + type with a niche containing a range of integer values, and a range of + integer error codes, and rely on `Result` assigning specific niche + values to specific error codes, in order to match a specific ABI (such as the + Linux kernel's `ERR_PTR`). +- **Safety**: The attribute specified in this RFC requires an unsafe block to + set the field. Future extensions could allow safely setting the field, after + verifying in a compiler-visible manner that the value works. For instance: +- **`derive(TryInto)`**: Rust could support deriving `TryInto` from the + contained type to the structure. The implementation could explicitly check + the range, and return an error if not in-range. This would avoid the need to + write explicit `unsafe` code, and many uses may be able to elide or coalesce + the check if the compiler can prove the range of a value at compile time. +- **Lints**: Multiple lints may help users define niches, or detect usages of + niches that may be better expressed via other means. For instance, a lint + could detect a newtype whose constructor maintains a range invariant, and + suggest adding a niche. +- **Range types**: Rust (or libraries built atop Rust) could provide integer + types with associated valid ranges, along with operations that + expand/contract/propagate those ranges as appropriate. +- **`unsafe` fields**: If in the future Rust introduces `unsafe` fields, + declaring a niche could internally mark the field as unsafe, taking advantage + of the same machinery. +- **Move types, or types that don't support references**: Rust currently + requires that all values of a given type have the same representation no + matter where they get stored, to allow taking references to such types and + passing them to contexts that don't know about any relevant storage quirks + such as niches. Given a mechanism for disallowing references to a type and + requiring users to copy or move it rather than referencing it in-place, Rust + could more aggressively optimize storage layout, such as by renumbering enum + values and translating them back when read. From 2947255735ad791ec25c0f2245fbefcc9e245543 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 13:33:07 +0100 Subject: [PATCH 02/49] Niche is RFC 3334 --- text/{0000-niche.md => 3334-niche.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename text/{0000-niche.md => 3334-niche.md} (99%) diff --git a/text/0000-niche.md b/text/3334-niche.md similarity index 99% rename from text/0000-niche.md rename to text/3334-niche.md index 540ff2ee3d3..c01dd5f54f9 100644 --- a/text/0000-niche.md +++ b/text/3334-niche.md @@ -1,6 +1,6 @@ - Feature Name: `niche` - Start Date: 2022-10-16 -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- RFC PR: [rust-lang/rfcs#3334](https://github.com/rust-lang/rfcs/pull/3334) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary From 26b01d684c860fb14a5af07b73472cb34b398659 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 14:05:34 +0100 Subject: [PATCH 03/49] Forbid structs with generic parameters affecting the non-ZST field --- text/3334-niche.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index c01dd5f54f9..695b9a92628 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -90,6 +90,9 @@ pairs within a single `niche` attribute, results in an error. Declaring a niche on a struct containing more or less than one non-zero-sized field results in an error. +Declaring a niche on a struct that has any generic parameters affecting the +non-zero-sized field results in an error. + Declaring a range niche with an empty range (e.g. `0..0`) results in a warn-by-default lint. As with many lints, this lint should be automatically suppressed for code expanded from a macro. From f1b671aebd89b3ab5c38c68983d604cb752d1115 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 14:16:12 +0100 Subject: [PATCH 04/49] Fix typo --- text/3334-niche.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 695b9a92628..c451d2b7e53 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -64,7 +64,7 @@ assert_eq!(size_of::>>>(), 1); Constructing a structure with a niche value, or writing to the non-ZST field of such a structure, or obtaining a mutable reference to such a field, requires -`unsafe` code. Causing a type with a niche to contain an invalid value (whther +`unsafe` code. Causing a type with a niche to contain an invalid value (whether by construction, writing, or transmuting) results in undefined behavior. If a type `T` contains only a single niche value, `Option` (and other enums From 74fee72c01d67bd387a920c45da81dc1cebb2dac Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 14:21:20 +0100 Subject: [PATCH 05/49] Further clarify the non-guarantees about multiple niche values --- text/3334-niche.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index c451d2b7e53..10dee861398 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -74,6 +74,9 @@ additionally `repr(transparent)` or `repr(C)` or otherwise permitted in FFI, `Option` will likewise be permitted in FFI, with the niche value mapping bidirectionally to `None` across the FFI boundary. +If a type contains multiple niche values, Rust does not guarantee any +particular mapping at this time, but may in the future. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -112,12 +115,12 @@ error. Constants can use compile-time evaluation, and compile-time evaluation does not occur early enough for attributes such as niche declarations. If a type `T` contains multiple niche values (e.g. `#[niche(range = 8..16)]`), -the compiler does not define the representation of types containing `T`, except -that multiple instances of the same identical type (e.g. `Option` and -`Option`) will use an identical representation (permitting a round-trip -`transmute` of such a value via bytes). In particular, the compiler does not -commit to making use of all the invalid values of the niche, even if it -otherwise could have. +the compiler does not guarantee any particular usage of those niche values in +the representation of types containing `T`, except that multiple instances of +the same identical type (e.g. `Option` and `Option`) will use an +identical representation (permitting a round-trip `transmute` of such a value +via bytes). In particular, the compiler does not commit to making use of all +the invalid values of the niche, even if it otherwise could have. If a type `T` contains niches and uses `repr(C)` or `repr(transparent)`, the compiler guarantees to use the same storage size for the type as it would From ae1380266e7db43acc48377dad1a11d9871ad093 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 14:24:11 +0100 Subject: [PATCH 06/49] Further clarify non-support for negative values at this time --- text/3334-niche.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 10dee861398..c4a5bda3586 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -104,8 +104,11 @@ Declaring a range niche with an invalid range (e.g. `5..0`) results in an error. Declaring a niche using a negative value or a negative range endpoint results -in an error. The text of the error should suggest the appropriate unsigned -equivalent to use. The compiler may support this in the future. +in an error. The representation of negative values depends on the size of the +type, and the compiler may not have that information at the time it handles +attributes such as `niche`. The text of the error should suggest the +appropriate unsigned equivalent to use. The compiler may support this in the +future. Declaring a range niche with an open start (`..3`) results in an error, for forwards-compatibility with support for negative values. From ae3671a1c715582315fe6e89eb1f5679cb681aee Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 14:43:56 +0100 Subject: [PATCH 07/49] Specify two's-complement specifically --- text/3334-niche.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index c4a5bda3586..fa380331a55 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -107,8 +107,8 @@ Declaring a niche using a negative value or a negative range endpoint results in an error. The representation of negative values depends on the size of the type, and the compiler may not have that information at the time it handles attributes such as `niche`. The text of the error should suggest the -appropriate unsigned equivalent to use. The compiler may support this in the -future. +appropriate two's-complement unsigned equivalent to use. The compiler may +support this in the future. Declaring a range niche with an open start (`..3`) results in an error, for forwards-compatibility with support for negative values. From f4b063192ad48c7dd9b19965a8d771cb1a037ea9 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 14:59:38 +0100 Subject: [PATCH 08/49] Discuss mapping to unsigned integer representation --- text/3334-niche.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index fa380331a55..f59a7273f00 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -84,6 +84,15 @@ The attribute `#[niche]` may only appear on a struct declaration. The struct must contain exactly one field of a non-zero-sized type (non-ZST). The struct may contain zero or more ZST fields, such as `PhantomData`. +The niche attribute may either contain `value = N` where `N` is an unsigned +integer, or `range = R` where R is a range expression whose endpoints are both +unsigned integers. The unsigned integers may use any integer base +representation (decimal, hex, binary, octal), but must not have a type suffix. +The unsigned integers are interpreted as the bit patterns in memory +corresponding to the representation of the non-ZST field. For instance, a +struct with a float field could specify one or more NaN values as a niche using +the integer representation of those values. + Declaring a niche on any item other than a struct declaration results in an error. From 132d1f8efcb408a00598ca879df99ab3200d0e43 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 15:12:18 +0100 Subject: [PATCH 09/49] Alternatives: Add pattern-based syntax --- text/3334-niche.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index f59a7273f00..c1d0654697d 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -214,6 +214,9 @@ associated consts for invalid values. If we chose to do this in the future, the `#[niche(...)]` attribute could become forward-compatible with this, by generating the trait impl. +We could use a syntax based on patterns, such as `struct S(u8 is 0..=32);` or +`struct S(MyEnum is MyEnum::A | MyEnum::B)`. + # Prior art [prior-art]: #prior-art From 7a1b2e47b2e8f9d60f6e27256e7789e650c0e910 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 15:15:46 +0100 Subject: [PATCH 10/49] Reorder some of the reference-level explanation to group similar things --- text/3334-niche.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index c1d0654697d..04d9b730b97 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -80,10 +80,6 @@ particular mapping at this time, but may in the future. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -The attribute `#[niche]` may only appear on a struct declaration. The struct -must contain exactly one field of a non-zero-sized type (non-ZST). The struct -may contain zero or more ZST fields, such as `PhantomData`. - The niche attribute may either contain `value = N` where `N` is an unsigned integer, or `range = R` where R is a range expression whose endpoints are both unsigned integers. The unsigned integers may use any integer base @@ -93,15 +89,19 @@ corresponding to the representation of the non-ZST field. For instance, a struct with a float field could specify one or more NaN values as a niche using the integer representation of those values. +The attribute `#[niche]` may only appear on a struct declaration. The struct +must contain exactly one field of a non-zero-sized type (non-ZST). The struct +may contain zero or more ZST fields, such as `PhantomData`. + Declaring a niche on any item other than a struct declaration results in an error. -Declaring multiple `niche` attributes on a single item, or multiple key-value -pairs within a single `niche` attribute, results in an error. - Declaring a niche on a struct containing more or less than one non-zero-sized field results in an error. +Declaring multiple `niche` attributes on a single item, or multiple key-value +pairs within a single `niche` attribute, results in an error. + Declaring a niche on a struct that has any generic parameters affecting the non-zero-sized field results in an error. From 422dcd9e5916593ee75b01e7880ef986d11bc5e5 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 15:23:10 +0100 Subject: [PATCH 11/49] Further clarify round-trip via bytes --- text/3334-niche.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 04d9b730b97..6de689ea33a 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -128,11 +128,14 @@ does not occur early enough for attributes such as niche declarations. If a type `T` contains multiple niche values (e.g. `#[niche(range = 8..16)]`), the compiler does not guarantee any particular usage of those niche values in -the representation of types containing `T`, except that multiple instances of -the same identical type (e.g. `Option` and `Option`) will use an -identical representation (permitting a round-trip `transmute` of such a value -via bytes). In particular, the compiler does not commit to making use of all -the invalid values of the niche, even if it otherwise could have. +the representation of types containing `T`. In particular, the +compiler does not commit to making use of all the invalid values of the niche, +even if it otherwise could have. + +However, multiple instances of the same identical type (e.g. `Option` and +`Option`) will use an identical representation (whether the type contains a +niche or not). This permits a round-trip between such a value and a byte +representation. If a type `T` contains niches and uses `repr(C)` or `repr(transparent)`, the compiler guarantees to use the same storage size for the type as it would From 825d29cc4e7c45762c3992547ca54b8354c1c06f Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Sun, 23 Oct 2022 15:26:22 +0100 Subject: [PATCH 12/49] Add note about `non_exhaustive`. --- text/3334-niche.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 6de689ea33a..3da1a7d0186 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -91,7 +91,9 @@ the integer representation of those values. The attribute `#[niche]` may only appear on a struct declaration. The struct must contain exactly one field of a non-zero-sized type (non-ZST). The struct -may contain zero or more ZST fields, such as `PhantomData`. +may contain zero or more ZST fields, such as `PhantomData`. (Note that +`#[non_exhaustive]` types do not count as ZSTs for this purpose, even if they +*currently* contain no fields with non-zero sizes.) Declaring a niche on any item other than a struct declaration results in an error. From 1a36e0382f5ddd129c11432c53a2aaf410bb5b75 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 11:40:22 +0100 Subject: [PATCH 13/49] Drop support for having ZST fields Avoid unneded complications. --- text/3334-niche.md | 51 ++++++++++++++++++++-------------------------- 1 file changed, 22 insertions(+), 29 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 3da1a7d0186..8b4b036c25f 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -39,12 +39,12 @@ without manual bit-twiddling. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -When defining a structure containing exactly one field of a non-zero-sized type -(non-ZST), you can attach a `niche` attribute on it to declare a specific value -or range of values for that field as invalid. This promises the compiler that -you will never store those values in that field, which allows the compiler to -use those in-memory representations for different purposes, such as the -representation of `None` in a containing `Option`. +When defining a struct containing exactly one field, you can attach a `niche` +attribute to the struct to declare a specific value or range of values for that +field as invalid. This promises the compiler that you will never store those +values in that field, which allows the compiler to use those in-memory +representations for different purposes, such as the representation of `None` in +a containing `Option`. ```rust use std::mem::size_of; @@ -62,10 +62,10 @@ assert_eq!(size_of::(), 1); assert_eq!(size_of::>>>(), 1); ``` -Constructing a structure with a niche value, or writing to the non-ZST field of -such a structure, or obtaining a mutable reference to such a field, requires -`unsafe` code. Causing a type with a niche to contain an invalid value (whether -by construction, writing, or transmuting) results in undefined behavior. +Constructing a structure with a niche value, or writing to the field of such a +structure, or obtaining a mutable reference to such a field, requires `unsafe` +code. Causing a type with a niche to contain an invalid value (whether by +construction, writing, or transmuting) results in undefined behavior. If a type `T` contains only a single niche value, `Option` (and other enums isomorphic to it, with one variant containing `T` and one nullary variant) will @@ -85,27 +85,25 @@ integer, or `range = R` where R is a range expression whose endpoints are both unsigned integers. The unsigned integers may use any integer base representation (decimal, hex, binary, octal), but must not have a type suffix. The unsigned integers are interpreted as the bit patterns in memory -corresponding to the representation of the non-ZST field. For instance, a -struct with a float field could specify one or more NaN values as a niche using -the integer representation of those values. +corresponding to the representation of the field. For instance, a struct with a +float field could specify one or more NaN values as a niche using the integer +representation of those values. The attribute `#[niche]` may only appear on a struct declaration. The struct -must contain exactly one field of a non-zero-sized type (non-ZST). The struct -may contain zero or more ZST fields, such as `PhantomData`. (Note that -`#[non_exhaustive]` types do not count as ZSTs for this purpose, even if they -*currently* contain no fields with non-zero sizes.) +must contain exactly one field. The field must have a non-zero-sized type +(non-ZST). Declaring a niche on any item other than a struct declaration results in an error. -Declaring a niche on a struct containing more or less than one non-zero-sized -field results in an error. +Declaring a niche on a struct containing more or less than one field results in +an error. Declaring multiple `niche` attributes on a single item, or multiple key-value pairs within a single `niche` attribute, results in an error. -Declaring a niche on a struct that has any generic parameters affecting the -non-zero-sized field results in an error. +Declaring a niche on a struct that has any generic parameters results in an +error. Declaring a range niche with an empty range (e.g. `0..0`) results in a warn-by-default lint. As with many lints, this lint should be automatically @@ -240,11 +238,6 @@ and innovation since computing antiquity. # Unresolved questions [unresolved-questions]: #unresolved-questions -Does the compiler support niches on structs containing ZST fields such as -`PhantomData`? If it doesn't, then initially, having a limitation to only -structs containing a single field would be fine, and would not substantially -reduce the usefulness of stabilizing this feature. - Could we support niches on generic types? For instance, could we support declaring a niche of `0` on a generic structure with a single field? @@ -252,9 +245,9 @@ Could we support negative numbers in a niche attribute, at least for fields of concrete primitive type? That would provide a much more friendly interface, but would require the compiler to better understand the type and its size. -Will something go wrong if applying a niche to a struct whose non-ZST field is -itself a struct containing multiple fields? Do we need to restrict niches to -structs containing primitive types, or similar? +Will something go wrong if applying a niche to a struct whose field is itself a +struct containing multiple fields? Do we need to restrict niches to structs +containing primitive types, or similar? Do we need to make `niche` mutually exclusive with `packed`? What about other attributes? From ffd19650c5bc11974eca90c966b04e43b28eb5c7 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 11:42:32 +0100 Subject: [PATCH 14/49] Add future possibility of structs with ZST fields --- text/3334-niche.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 8b4b036c25f..cde20b2deec 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -303,6 +303,8 @@ within the attribute. rather than a range. - **Per-field niches**: A structure containing multiple fields could have a niche on a specific field, rather than the whole structure. +- **structs with ZST fields**: A struct could contain fields with zero-sized + types (e.g. `PhantomData`) and still have a niche. - **Whole-structure niches**: A structure containing multiple non-zero-sized fields could have a niche of invalid values for the whole structure. - **Union niches**: A union could have a niche. From 981672421b9b86c4df00e8ef6f6eb617acde0f9b Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 11:49:22 +0100 Subject: [PATCH 15/49] Guarantee that adding a niche doesn't change storage size Expand the discussion of move types under future possibilities to cover this as something move types would enable. --- text/3334-niche.md | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index cde20b2deec..f78980543d6 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -137,21 +137,10 @@ However, multiple instances of the same identical type (e.g. `Option` and niche or not). This permits a round-trip between such a value and a byte representation. -If a type `T` contains niches and uses `repr(C)` or `repr(transparent)`, the -compiler guarantees to use the same storage size for the type as it would -without the niche, even if the niche might allow storing fewer bytes. If a type -`T` contains niches and uses the default (`Rust`) `repr`, the compiler may -choose to represent the type using fewer bytes if the niche would allow doing -so. For instance: - -```rust -#[niche(range = 4..)] -struct S { - field: u16, -} - -// `size_of::()` may return less than 2 -``` +Adding a niche to a type does not change the storage size of the type, even if +the niche might otherwise allow storing fewer bytes. The type still allows +obtaining mutable references to the field, which requires storing valid values +using the same representation as those values would have had without the niche. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -340,4 +329,5 @@ within the attribute. such as niches. Given a mechanism for disallowing references to a type and requiring users to copy or move it rather than referencing it in-place, Rust could more aggressively optimize storage layout, such as by renumbering enum - values and translating them back when read. + values and translating them back when read, or by storing fields using fewer + bytes if their valid range requires fewer bytes to fully represent. From ca5970e90e52ea32b0b5a9bf591db0e3adef6bed Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 12:29:29 +0100 Subject: [PATCH 16/49] Only allow simple field types --- text/3334-niche.md | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index f78980543d6..5a28e66a1af 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -90,8 +90,22 @@ float field could specify one or more NaN values as a niche using the integer representation of those values. The attribute `#[niche]` may only appear on a struct declaration. The struct -must contain exactly one field. The field must have a non-zero-sized type -(non-ZST). +must contain exactly one field. + +The field must have one of a restricted set of types: +- A built-in integer type (iN or uN). +- A built-in floating-point type (fN). (The niche must still be specified using + the integer representation.) +- A `char`. (The niche uses the integer representation, and gets merged with + the built-in niches of `char`; if the result after merging would have + multiple discontiguous niches, the compiler need not take all of them into + account.) +- A raw pointer. (This allows user-defined types to store a properly typed + pointer while taking advantage of known-invalid pointer values.) +- A fieldless enum with a `repr` of a primitive integer type. + +Declaring a niche on a struct whose field type does not meet these restrictions +results in an error. Declaring a niche on any item other than a struct declaration results in an error. @@ -294,6 +308,10 @@ within the attribute. niche on a specific field, rather than the whole structure. - **structs with ZST fields**: A struct could contain fields with zero-sized types (e.g. `PhantomData`) and still have a niche. +- **Non-primitive fields**: A struct could contain fields of non-primitive + types, such as tuples, arrays, or other structs (including structs with + niches themselves). This should wait until after niches support providing + values with the type of the field, rather than as an unsigned integer. - **Whole-structure niches**: A structure containing multiple non-zero-sized fields could have a niche of invalid values for the whole structure. - **Union niches**: A union could have a niche. From eaf6327cc5ee665b7faf3c30be959e85f62972b4 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 12:32:32 +0100 Subject: [PATCH 17/49] Future possibilities: fields of reference type --- text/3334-niche.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 5a28e66a1af..a3771fc6756 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -308,6 +308,10 @@ within the attribute. niche on a specific field, rather than the whole structure. - **structs with ZST fields**: A struct could contain fields with zero-sized types (e.g. `PhantomData`) and still have a niche. +- **Fields of reference type**: In addition to allowing raw pointers, structs + with niches could allow references. In practice, if the references have a + lifetime other than `'static`, this will also require at least some support + for generic parameters. - **Non-primitive fields**: A struct could contain fields of non-primitive types, such as tuples, arrays, or other structs (including structs with niches themselves). This should wait until after niches support providing From bb5b5b921464768a607143973b3e0b047a278a66 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 12:51:28 +0100 Subject: [PATCH 18/49] Future possibilities: mention interaction with read-only fields --- text/3334-niche.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index a3771fc6756..91638bbe804 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -344,6 +344,9 @@ within the attribute. - **`unsafe` fields**: If in the future Rust introduces `unsafe` fields, declaring a niche could internally mark the field as unsafe, taking advantage of the same machinery. +- **read-only fields**: If in the future Rust introduces read-only fields, + types with a niche may wish to provide read-only access to the value they + contain, rather than just providing conversion methods or traits. - **Move types, or types that don't support references**: Rust currently requires that all values of a given type have the same representation no matter where they get stored, to allow taking references to such types and From 93d47075e4fd146c7fac843cd2c1742dd3ae49a2 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 12:55:45 +0100 Subject: [PATCH 19/49] Consistently use "struct" rather than "structure" --- text/3334-niche.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 91638bbe804..e8b35cafab3 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -62,8 +62,8 @@ assert_eq!(size_of::(), 1); assert_eq!(size_of::>>>(), 1); ``` -Constructing a structure with a niche value, or writing to the field of such a -structure, or obtaining a mutable reference to such a field, requires `unsafe` +Constructing a struct with a niche value, or writing to the field of such a +struct, or obtaining a mutable reference to such a field, requires `unsafe` code. Causing a type with a niche to contain an invalid value (whether by construction, writing, or transmuting) results in undefined behavior. @@ -242,7 +242,7 @@ and innovation since computing antiquity. [unresolved-questions]: #unresolved-questions Could we support niches on generic types? For instance, could we support -declaring a niche of `0` on a generic structure with a single field? +declaring a niche of `0` on a generic struct with a single field? Could we support negative numbers in a niche attribute, at least for fields of concrete primitive type? That would provide a much more friendly interface, but @@ -304,8 +304,8 @@ within the attribute. single range. - **Other bit-pattern niches**: A type could define niches via a bit pattern, rather than a range. -- **Per-field niches**: A structure containing multiple fields could have a - niche on a specific field, rather than the whole structure. +- **Per-field niches**: A struct containing multiple fields could have a niche + on a specific field, rather than the whole struct. - **structs with ZST fields**: A struct could contain fields with zero-sized types (e.g. `PhantomData`) and still have a niche. - **Fields of reference type**: In addition to allowing raw pointers, structs @@ -316,8 +316,8 @@ within the attribute. types, such as tuples, arrays, or other structs (including structs with niches themselves). This should wait until after niches support providing values with the type of the field, rather than as an unsigned integer. -- **Whole-structure niches**: A structure containing multiple non-zero-sized - fields could have a niche of invalid values for the whole structure. +- **Whole-struct niches**: A struct containing multiple non-zero-sized fields + could have a niche of invalid values for the whole struct. - **Union niches**: A union could have a niche. - **Enum niches**: An enum or an enum variant could have a niche. - **Specified mappings into niches**: Users may want to rely on mappings of @@ -330,8 +330,8 @@ within the attribute. set the field. Future extensions could allow safely setting the field, after verifying in a compiler-visible manner that the value works. For instance: - **`derive(TryInto)`**: Rust could support deriving `TryInto` from the - contained type to the structure. The implementation could explicitly check - the range, and return an error if not in-range. This would avoid the need to + contained type to the struct. The implementation could explicitly check the + range, and return an error if not in-range. This would avoid the need to write explicit `unsafe` code, and many uses may be able to elide or coalesce the check if the compiler can prove the range of a value at compile time. - **Lints**: Multiple lints may help users define niches, or detect usages of From 4559ee9c69bbea0e91848d6555781dc7539708a4 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 13:05:47 +0100 Subject: [PATCH 20/49] Future possibilities: change `derive(TryInto)` to `derive(TryFrom)` --- text/3334-niche.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index e8b35cafab3..329dc2944ad 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -329,7 +329,7 @@ within the attribute. - **Safety**: The attribute specified in this RFC requires an unsafe block to set the field. Future extensions could allow safely setting the field, after verifying in a compiler-visible manner that the value works. For instance: -- **`derive(TryInto)`**: Rust could support deriving `TryInto` from the +- **`derive(TryFrom)`**: Rust could support deriving `TryFrom` from the contained type to the struct. The implementation could explicitly check the range, and return an error if not in-range. This would avoid the need to write explicit `unsafe` code, and many uses may be able to elide or coalesce From 5b03ea2e5186f7853b19fe2298266ec9ce0c3957 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 13:17:41 +0100 Subject: [PATCH 21/49] Add language to reference section about unsafe vs safe operations --- text/3334-niche.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 329dc2944ad..f511ba893a0 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -80,6 +80,19 @@ particular mapping at this time, but may in the future. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation +If a struct contains a niche, the following operations may only occur in +`unsafe` code, and produce an error if invoked in safe code: +- Constructing the struct, which requires initializing the field. +- Writing to the field. +- Obtaining a mutable reference to the field. + +Other operations, including reading from the field, obtaining a non-mutable +reference to the field, obtaining a mutable reference to the whole struct, or +assigning to the whole struct, are not affected by the presence of the niche. + +Causing a type with a niche to contain an invalid value (whether by +construction, writing, or transmuting) results in undefined behavior. + The niche attribute may either contain `value = N` where `N` is an unsigned integer, or `range = R` where R is a range expression whose endpoints are both unsigned integers. The unsigned integers may use any integer base From b8bcfda955259752efc647b17246c32576bbc91e Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 13:17:49 +0100 Subject: [PATCH 22/49] Add information to the guide-level section on typical operations --- text/3334-niche.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index f511ba893a0..68eb7b62fe6 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -67,6 +67,23 @@ struct, or obtaining a mutable reference to such a field, requires `unsafe` code. Causing a type with a niche to contain an invalid value (whether by construction, writing, or transmuting) results in undefined behavior. +Typically, a user-defined type with a niche may wish to provide safe methods to +construct or modify the type. For instance, a type `T` *might* choose to +provide one or more of the following, depending on what makes sense for the +expected usage of the type: +- a `new` or `try_new` method returning a `Result` or `Option` +- an unsafe `new_unchecked` method returning `T` +- `TryFrom` implementations for conversions that can fail +- `From` implementations for conversions from types that fully map to the valid + values, such that the conversion cannot fail +- an implementation of `Default` +- constant values of the type +- methods that may fail to map to the valid range, and return `Result` or + `Option` +- operators that may fail by panicking +- saturating, checked, or similar versions of operators that cannot fail +- methods or operators that cannot fail + If a type `T` contains only a single niche value, `Option` (and other enums isomorphic to it, with one variant containing `T` and one nullary variant) will use that value to represent `None` (the nullary variant). If such a `T` is From c3c6154e45f8f4ce21b205178cd6d5519467c893 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 13:20:25 +0100 Subject: [PATCH 23/49] Consistently use "niche" rather than "invalid" except when explaining "niche" --- text/3334-niche.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 68eb7b62fe6..aaaeccd42b3 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -64,8 +64,8 @@ assert_eq!(size_of::>>>(), 1); Constructing a struct with a niche value, or writing to the field of such a struct, or obtaining a mutable reference to such a field, requires `unsafe` -code. Causing a type with a niche to contain an invalid value (whether by -construction, writing, or transmuting) results in undefined behavior. +code. Causing a type with a niche to contain one of its niche values (whether +by construction, writing, or transmuting) results in undefined behavior. Typically, a user-defined type with a niche may wish to provide safe methods to construct or modify the type. For instance, a type `T` *might* choose to @@ -107,7 +107,7 @@ Other operations, including reading from the field, obtaining a non-mutable reference to the field, obtaining a mutable reference to the whole struct, or assigning to the whole struct, are not affected by the presence of the niche. -Causing a type with a niche to contain an invalid value (whether by +Causing a type with a niche to contain one of its niche values (whether by construction, writing, or transmuting) results in undefined behavior. The niche attribute may either contain `value = N` where `N` is an unsigned @@ -131,7 +131,7 @@ The field must have one of a restricted set of types: multiple discontiguous niches, the compiler need not take all of them into account.) - A raw pointer. (This allows user-defined types to store a properly typed - pointer while taking advantage of known-invalid pointer values.) + pointer while using known-invalid pointer values as niches.) - A fieldless enum with a `repr` of a primitive integer type. Declaring a niche on a struct whose field type does not meet these restrictions @@ -172,9 +172,9 @@ does not occur early enough for attributes such as niche declarations. If a type `T` contains multiple niche values (e.g. `#[niche(range = 8..16)]`), the compiler does not guarantee any particular usage of those niche values in -the representation of types containing `T`. In particular, the -compiler does not commit to making use of all the invalid values of the niche, -even if it otherwise could have. +the representation of types containing `T`. In particular, the compiler does +not commit to making use of all the niche values, even if it otherwise could +have. However, multiple instances of the same identical type (e.g. `Option` and `Option`) will use an identical representation (whether the type contains a @@ -246,7 +246,7 @@ building block like `niche`, and providing the underlying building blocks to the ecosystem as well. We could implement niches using a trait `Niche` implemented for a type, with -associated consts for invalid values. If we chose to do this in the future, the +associated consts for niche values. If we chose to do this in the future, the `#[niche(...)]` attribute could become forward-compatible with this, by generating the trait impl. @@ -347,7 +347,7 @@ within the attribute. niches themselves). This should wait until after niches support providing values with the type of the field, rather than as an unsigned integer. - **Whole-struct niches**: A struct containing multiple non-zero-sized fields - could have a niche of invalid values for the whole struct. + could have niche values for the whole struct. - **Union niches**: A union could have a niche. - **Enum niches**: An enum or an enum variant could have a niche. - **Specified mappings into niches**: Users may want to rely on mappings of From bb1a8ec9285b266f5e549326e784717fb5354e0e Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 13:23:13 +0100 Subject: [PATCH 24/49] Future possibilities: safe writes of compile-time constants --- text/3334-niche.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index aaaeccd42b3..aa29130ce7e 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -358,7 +358,9 @@ within the attribute. Linux kernel's `ERR_PTR`). - **Safety**: The attribute specified in this RFC requires an unsafe block to set the field. Future extensions could allow safely setting the field, after - verifying in a compiler-visible manner that the value works. For instance: + verifying in a compiler-visible manner that the value does not fall within + the niche. For instance, via `derive(TryFrom)` (see below), or by checking a + compile-time constant expression to see if it falls within the niche. - **`derive(TryFrom)`**: Rust could support deriving `TryFrom` from the contained type to the struct. The implementation could explicitly check the range, and return an error if not in-range. This would avoid the need to From ad9bf3404c1fab5089ace888c51ff5c1b0f230bc Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 13:24:54 +0100 Subject: [PATCH 25/49] Clarify language regarding compile-time constants and verification --- text/3334-niche.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index aa29130ce7e..b9d41f7364b 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -359,8 +359,9 @@ within the attribute. - **Safety**: The attribute specified in this RFC requires an unsafe block to set the field. Future extensions could allow safely setting the field, after verifying in a compiler-visible manner that the value does not fall within - the niche. For instance, via `derive(TryFrom)` (see below), or by checking a - compile-time constant expression to see if it falls within the niche. + the niche. For instance, via `derive(TryFrom)` (see below), or by checking + the value of a compile-time constant expression to ensure that it does not + fall within the niche. - **`derive(TryFrom)`**: Rust could support deriving `TryFrom` from the contained type to the struct. The implementation could explicitly check the range, and return an error if not in-range. This would avoid the need to From f16a7306d410a67338a8cc0f54e6beb3f9fc3a6b Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 13:38:47 +0100 Subject: [PATCH 26/49] Explicitly allow `niche` to appear inside `cfg_attr` --- text/3334-niche.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index b9d41f7364b..0f1282d85b3 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -146,6 +146,10 @@ an error. Declaring multiple `niche` attributes on a single item, or multiple key-value pairs within a single `niche` attribute, results in an error. +The `niche` attribute may appear inside `cfg_attr`. The net effect after +evaluating all configuration must be to apply either zero or one `niche` +attribute to the type. + Declaring a niche on a struct that has any generic parameters results in an error. From ee9359f30516077f6ef9d1437295bf6552cdea23 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 14:03:42 +0100 Subject: [PATCH 27/49] Alternatives: Add detailed discussion of niche-on-field vs niche-on-struct --- text/3334-niche.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 0f1282d85b3..287a9fc7ac6 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -257,6 +257,35 @@ generating the trait impl. We could use a syntax based on patterns, such as `struct S(u8 is 0..=32);` or `struct S(MyEnum is MyEnum::A | MyEnum::B)`. +We could attach the `#[niche(...)]` attribute to the *field* rather than to the +struct. This would have the advantage of extending naturally to multiple +fields, and would associate the value restrictions specifically to the field +they apply to. This would also be more convenient for application to enum +fields. However, this would be less convenient for defining single-field tuple +structs: + +```rust +// Proposed syntax +#[niche(value = 0)] +struct NonZeroU32(u32); + +// Alternative syntax +struct NonZeroU32( + #[niche(value = 0)] + u32, +); +``` + +In addition, that alternative syntax would *not* work for future multi-field +niches that need to correlate across fields (e.g. a niche for one field that +depends on the value of another field). It also would not work as well for +niches on a `union`. + +Since the syntax proposed in this RFC requires exactly one field in the struct, +this does not prevent future syntax additions from adding a niche attribute on +fields, in which case the two could be declared as equivalent on a single-field +struct. + # Prior art [prior-art]: #prior-art From 08bb5b427cc91a4fdb96dc2f700ae665bb1fe522 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 14:57:22 +0100 Subject: [PATCH 28/49] Explicitly allow inclusive and exclusive ranges --- text/3334-niche.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 287a9fc7ac6..060a838b844 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -112,12 +112,15 @@ construction, writing, or transmuting) results in undefined behavior. The niche attribute may either contain `value = N` where `N` is an unsigned integer, or `range = R` where R is a range expression whose endpoints are both -unsigned integers. The unsigned integers may use any integer base -representation (decimal, hex, binary, octal), but must not have a type suffix. -The unsigned integers are interpreted as the bit patterns in memory -corresponding to the representation of the field. For instance, a struct with a -float field could specify one or more NaN values as a niche using the integer -representation of those values. +unsigned integers. + +The unsigned integers may use any integer base representation (decimal, hex, +binary, octal), but must not have a type suffix. The unsigned integers are +interpreted as the bit patterns in memory corresponding to the representation +of the field. For instance, a struct with a float field could specify one or +more NaN values as a niche using the integer representation of those values. + +The range may be either exclusive (`start..end`) or inclusive (`start..=end`). The attribute `#[niche]` may only appear on a struct declaration. The struct must contain exactly one field. From 77069e6ead36f94be67c1e983cea9e04666f30e1 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 15:29:32 +0100 Subject: [PATCH 29/49] Note that the compiler may be able to perform additional optimizations --- text/3334-niche.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 060a838b844..efdc7b6ff2c 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -193,6 +193,11 @@ the niche might otherwise allow storing fewer bytes. The type still allows obtaining mutable references to the field, which requires storing valid values using the same representation as those values would have had without the niche. +Declaring a niche *may* allow additional optimizations that assume the type +cannot contain the niche values, though the compiler does not guarantee this. +For instance, the compiler may be able to elide bounds checks that the valid +values always satisfy. + # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives From 58d3b9e3274ddcbe84599e6403ba75eddde091df Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 15:55:52 +0100 Subject: [PATCH 30/49] Discuss pattern matching --- text/3334-niche.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index efdc7b6ff2c..249fbdd6c56 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -198,6 +198,10 @@ cannot contain the niche values, though the compiler does not guarantee this. For instance, the compiler may be able to elide bounds checks that the valid values always satisfy. +Niches do not affect pattern-matching exhaustiveness. For the purposes of +pattern matching, the compiler will check exhaustiveness as if the field could +take on any value. + # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives From d5966c67eb5be93d433386c6c7b1ad9900abd550 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 15:57:21 +0100 Subject: [PATCH 31/49] Explicitly allow open-ended ranges. --- text/3334-niche.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 249fbdd6c56..5134ac54721 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -120,7 +120,8 @@ interpreted as the bit patterns in memory corresponding to the representation of the field. For instance, a struct with a float field could specify one or more NaN values as a niche using the integer representation of those values. -The range may be either exclusive (`start..end`) or inclusive (`start..=end`). +The range may be exclusive (`start..end`), inclusive (`start..=end`), or +open-ended (`start..`). The attribute `#[niche]` may only appear on a struct declaration. The struct must contain exactly one field. From c18bee31fd180cef7735cbf75cc2264392d89e90 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:01:56 +0100 Subject: [PATCH 32/49] Future possibilities: add niches affecting pattern-matching exhaustiveness --- text/3334-niche.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 5134ac54721..cba9898bb26 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -417,6 +417,11 @@ within the attribute. niches that may be better expressed via other means. For instance, a lint could detect a newtype whose constructor maintains a range invariant, and suggest adding a niche. +- **Niches affecting pattern-matching exhaustiveness**: In the future, Rust + could support having niches affect pattern-matching exhaustiveness. If so, + that future version of Rust would need to do so in a backwards-compatible + manner, such as by ensuring that the resulting redundant match arms produce + at most a suppressible warning lint (at least until an edition boundary). - **Range types**: Rust (or libraries built atop Rust) could provide integer types with associated valid ranges, along with operations that expand/contract/propagate those ranges as appropriate. From 627df7fd553243c6f7dd1059f5df8b212b2b8acb Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:03:15 +0100 Subject: [PATCH 33/49] Explicitly allow construction or writing from `const` code --- text/3334-niche.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index cba9898bb26..c518e974622 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -94,6 +94,9 @@ bidirectionally to `None` across the FFI boundary. If a type contains multiple niche values, Rust does not guarantee any particular mapping at this time, but may in the future. +Structs with niches may be constructed or written to in `const` code, though +such construction or writing still requires `unsafe`. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation From 5109f44da53b3042e3c8779f1d9df81f36dcf2c5 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:08:06 +0100 Subject: [PATCH 34/49] Note that `#[niche(...)]` would be forward-compatible with a pattern syntax --- text/3334-niche.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index c518e974622..8229ada6743 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -271,7 +271,8 @@ associated consts for niche values. If we chose to do this in the future, the generating the trait impl. We could use a syntax based on patterns, such as `struct S(u8 is 0..=32);` or -`struct S(MyEnum is MyEnum::A | MyEnum::B)`. +`struct S(MyEnum is MyEnum::A | MyEnum::B)`. The `niche` attribute could be +forward-compatible with this, by generating the appropriate patterns. We could attach the `#[niche(...)]` attribute to the *field* rather than to the struct. This would have the advantage of extending naturally to multiple From f77eb45be1de8186dd0260b5dcdbaf64519c0b75 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:15:20 +0100 Subject: [PATCH 35/49] Discuss interaction with `derive(Default)` --- text/3334-niche.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 8229ada6743..4bfd62ac9ac 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -206,6 +206,10 @@ Niches do not affect pattern-matching exhaustiveness. For the purposes of pattern matching, the compiler will check exhaustiveness as if the field could take on any value. +If a struct has both a niche and `derive(Default)` declared on it, the compiler +will check if the default value falls within the niche, and produce an error if +so. + # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives From 5965de8b5edc4ee636fcc06464da75ed44e0be92 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:26:41 +0100 Subject: [PATCH 36/49] Note that a `derive(TryFrom)` would also avoid duplicating the range --- text/3334-niche.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 4bfd62ac9ac..fde8ec627c0 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -421,6 +421,7 @@ within the attribute. range, and return an error if not in-range. This would avoid the need to write explicit `unsafe` code, and many uses may be able to elide or coalesce the check if the compiler can prove the range of a value at compile time. + This would also avoid needing to duplicate the range in multiple places. - **Lints**: Multiple lints may help users define niches, or detect usages of niches that may be better expressed via other means. For instance, a lint could detect a newtype whose constructor maintains a range invariant, and From 811024aaa932a2f550e4e99123d3e348f6202380 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:32:19 +0100 Subject: [PATCH 37/49] Mention C and C++ bitfields --- text/3334-niche.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index fde8ec627c0..9ddc2da7750 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -319,6 +319,10 @@ The Rust compiler already supports user-defined niches via the unstable attributes `rustc_layout_scalar_valid_range_start` and `rustc_layout_scalar_valid_range_end`. +C, C++, and various other languages have "bitfields", which allow restricting +the range and storage of a type based on the number of bits used to store it. +This doesn't allow excluding a more fine-grained range, though. + Bit-twiddling tricks to store information compactly have seen widespread use and innovation since computing antiquity. From 014d504eb8c8f66429df832a1a68f18c0e3ce4ff Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:34:20 +0100 Subject: [PATCH 38/49] Note why we don't support `bool` --- text/3334-niche.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 9ddc2da7750..65e2c61877b 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -307,6 +307,11 @@ this does not prevent future syntax additions from adding a niche attribute on fields, in which case the two could be declared as equivalent on a single-field struct. +We could support `bool`, just as easily as `char`. However, since `bool` has +only two valid values, any niche applying a further restriction to it would +result in either a one-value type or a zero-value type, neither of which seems +useful enough to support. + # Prior art [prior-art]: #prior-art From 4856096c70ab60e748a099c70e7df51b0469d2dd Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:36:52 +0100 Subject: [PATCH 39/49] Add Ada as precedent --- text/3334-niche.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 65e2c61877b..a9934749635 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -328,6 +328,8 @@ C, C++, and various other languages have "bitfields", which allow restricting the range and storage of a type based on the number of bits used to store it. This doesn't allow excluding a more fine-grained range, though. +Ada supports declaring integer types with explicit ranges. + Bit-twiddling tricks to store information compactly have seen widespread use and innovation since computing antiquity. From bb38342f6207017023ba13ff5134e46b3249097e Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:47:49 +0100 Subject: [PATCH 40/49] Discuss how we can handle `derive(Default)` --- text/3334-niche.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index a9934749635..0a0b6c95044 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -350,6 +350,10 @@ containing primitive types, or similar? Do we need to make `niche` mutually exclusive with `packed`? What about other attributes? +Can we make `derive(Default)` detect errors? The compiler already has support +for detecting whether a type permits zero-initialization (used to produce a +warning for `mem::zeroed()`); hopefully we can make use of the same support. + # Future possibilities [future-possibilities]: #future-possibilities From c20796c4636c4ee104f5a89066543563a91961b5 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 16:51:24 +0100 Subject: [PATCH 41/49] Explicitly conflict with `repr(packed)` --- text/3334-niche.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 0a0b6c95044..66e90811e21 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -210,6 +210,9 @@ If a struct has both a niche and `derive(Default)` declared on it, the compiler will check if the default value falls within the niche, and produce an error if so. +If a struct has both a niche and `repr(packed)`, the compiler will produce an +error. + # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -347,8 +350,7 @@ Will something go wrong if applying a niche to a struct whose field is itself a struct containing multiple fields? Do we need to restrict niches to structs containing primitive types, or similar? -Do we need to make `niche` mutually exclusive with `packed`? What about other -attributes? +Are there any attributes we need to make mutually exclusive with `niche`? Can we make `derive(Default)` detect errors? The compiler already has support for detecting whether a type permits zero-initialization (used to produce a From 7f27700d2f03236affd37f385a9aaafe68a5f366 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 17:01:21 +0100 Subject: [PATCH 42/49] Alternative regarding `derive(Default)` --- text/3334-niche.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 66e90811e21..3df3fa5544b 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -315,6 +315,10 @@ only two valid values, any niche applying a further restriction to it would result in either a one-value type or a zero-value type, neither of which seems useful enough to support. +Rather than supporting `derive(Default)`, we could reject it, and wait for +general-purpose compiler support for safe assignment of compile-time constant +expressions. + # Prior art [prior-art]: #prior-art From b8e0397905ff50a90737965e2198e4208ab2b799 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Mon, 24 Oct 2022 17:12:35 +0100 Subject: [PATCH 43/49] Discuss mutable references in more detail --- text/3334-niche.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/text/3334-niche.md b/text/3334-niche.md index 3df3fa5544b..5fee9e01ee8 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -319,6 +319,14 @@ Rather than supporting `derive(Default)`, we could reject it, and wait for general-purpose compiler support for safe assignment of compile-time constant expressions. +We could entirely forbid taking mutable references to fields of structs with +niches, rather than allowing them in unsafe code. This would mean that unsafe +code could not produce such a reference and call other code with it. However, +that would also prevent calling mutating methods and otherwise reusing existing +code that makes use of `&mut`. Other unsafe code already incurs similar +obligations. We could also have lints detecting at least trivial misuses, such +as returning such a `&mut` reference from a safe method. + # Prior art [prior-art]: #prior-art From 48d1c22b426a750b2953169628ca9ba73b1664e9 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Tue, 25 Oct 2022 14:02:42 +0100 Subject: [PATCH 44/49] s/primitive integer type/built-in integer type/ --- text/3334-niche.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 5fee9e01ee8..ced9f666288 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -139,7 +139,7 @@ The field must have one of a restricted set of types: account.) - A raw pointer. (This allows user-defined types to store a properly typed pointer while using known-invalid pointer values as niches.) -- A fieldless enum with a `repr` of a primitive integer type. +- A fieldless enum with a `repr` of a built-in integer type. Declaring a niche on a struct whose field type does not meet these restrictions results in an error. From 6cf4c4e741d6a51154d20c6e2b53b92efdefaf5a Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Tue, 25 Oct 2022 14:15:21 +0100 Subject: [PATCH 45/49] s/either zero or one/at most one/ --- text/3334-niche.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index ced9f666288..8fa807de2bb 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -154,8 +154,8 @@ Declaring multiple `niche` attributes on a single item, or multiple key-value pairs within a single `niche` attribute, results in an error. The `niche` attribute may appear inside `cfg_attr`. The net effect after -evaluating all configuration must be to apply either zero or one `niche` -attribute to the type. +evaluating all configuration must be to apply at most one `niche` attribute to +the type. Declaring a niche on a struct that has any generic parameters results in an error. From a01196586ff17ae66c187c97b34376bbff402a42 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Tue, 25 Oct 2022 14:30:07 +0100 Subject: [PATCH 46/49] Clarify what else adding a niche *doesn't* change (alignment, other ABI details) --- text/3334-niche.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 8fa807de2bb..f783988d97b 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -192,10 +192,12 @@ However, multiple instances of the same identical type (e.g. `Option` and niche or not). This permits a round-trip between such a value and a byte representation. -Adding a niche to a type does not change the storage size of the type, even if -the niche might otherwise allow storing fewer bytes. The type still allows -obtaining mutable references to the field, which requires storing valid values -using the same representation as those values would have had without the niche. +Adding a niche to a type does not change the storage size, alignment, or other +ABI details of the type, even if the niche might otherwise allow storing fewer +bytes; it only changes the ABI of other types *containing* the type (e.g. +`Option`). The type still allows obtaining mutable references to the field, +which requires storing valid values using the same representation as those +values would have had without the niche. Declaring a niche *may* allow additional optimizations that assume the type cannot contain the niche values, though the compiler does not guarantee this. From ea1bd10e94bc5669bd37ca4f3754b5475d251634 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Tue, 25 Oct 2022 14:38:42 +0100 Subject: [PATCH 47/49] Remove a resolved question --- text/3334-niche.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index f783988d97b..46363082e4c 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -360,10 +360,6 @@ Could we support negative numbers in a niche attribute, at least for fields of concrete primitive type? That would provide a much more friendly interface, but would require the compiler to better understand the type and its size. -Will something go wrong if applying a niche to a struct whose field is itself a -struct containing multiple fields? Do we need to restrict niches to structs -containing primitive types, or similar? - Are there any attributes we need to make mutually exclusive with `niche`? Can we make `derive(Default)` detect errors? The compiler already has support From 5358cb5c303708a04f52846fc35408abc4e1128d Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Tue, 25 Oct 2022 14:40:11 +0100 Subject: [PATCH 48/49] Rework the RFC to use values of the field type: negative numbers and floats Drop support for enums for now, since using values of the enum type would require name resolution. Add them to future possibilities. Allow open-start ranges (exclusive and inclusive), since they were only disallowed for forward-compatibility with signed values. Add an explicit note that open-start ranges will include negative values. --- text/3334-niche.md | 84 ++++++++++++++++++++++++---------------------- 1 file changed, 43 insertions(+), 41 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index 46363082e4c..dddd54f66a1 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -67,6 +67,12 @@ struct, or obtaining a mutable reference to such a field, requires `unsafe` code. Causing a type with a niche to contain one of its niche values (whether by construction, writing, or transmuting) results in undefined behavior. +The field type must be a built-in integer or floating-point type, a `char`, or +a raw pointer. + +The value given for `value`, or the endpoints of the range given for `range`, +may be either a value of the same type as the field, or an unsigned integer. + Typically, a user-defined type with a niche may wish to provide safe methods to construct or modify the type. For instance, a type `T` *might* choose to provide one or more of the following, depending on what makes sense for the @@ -113,33 +119,46 @@ assigning to the whole struct, are not affected by the presence of the niche. Causing a type with a niche to contain one of its niche values (whether by construction, writing, or transmuting) results in undefined behavior. -The niche attribute may either contain `value = N` where `N` is an unsigned -integer, or `range = R` where R is a range expression whose endpoints are both -unsigned integers. +The niche attribute may either contain `value = N` or `range = R`. The value +given for `N`, or the endpoints of the range given for `R`, may be either a +literal value of the same type as the field, or an unsigned integer literal. + +Signed and unsigned integer literals may use any integer base representation +(decimal, hex, binary, octal), but must not have a type suffix. The unsigned +integers are interpreted as the bit patterns in memory corresponding to the +representation of the field. For instance, a struct with a float field could +specify one or more NaN values as a niche using the integer representation of +those values. -The unsigned integers may use any integer base representation (decimal, hex, -binary, octal), but must not have a type suffix. The unsigned integers are -interpreted as the bit patterns in memory corresponding to the representation -of the field. For instance, a struct with a float field could specify one or -more NaN values as a niche using the integer representation of those values. +The range may be exclusive (`start..end`), inclusive (`start..=end`), +open-ended (`start..`), open-start (`..end`), or open-start inclusive +(`..=end`). -The range may be exclusive (`start..end`), inclusive (`start..=end`), or -open-ended (`start..`). +Note that an open-start range on a signed field or floating-point field will +include all values less than the upper bound, including any negative numbers +less than the upper bound. For instance, a field of type `i8` with a niche +range of `..2` will have as niche values `1`, `0`, `-1`, `-2`, ..., `-128`. The attribute `#[niche]` may only appear on a struct declaration. The struct must contain exactly one field. The field must have one of a restricted set of types: -- A built-in integer type (iN or uN). -- A built-in floating-point type (fN). (The niche must still be specified using - the integer representation.) -- A `char`. (The niche uses the integer representation, and gets merged with - the built-in niches of `char`; if the result after merging would have - multiple discontiguous niches, the compiler need not take all of them into - account.) -- A raw pointer. (This allows user-defined types to store a properly typed - pointer while using known-invalid pointer values as niches.) -- A fieldless enum with a `repr` of a built-in integer type. +- A built-in unsigned integer type (uN). In this case, the niche specification + must use an unsigned integer. +- A built-in signed integer type (iN). In this case, the niche specification + may use a signed integer, or an unsigned integer corresponding to the two's + complement representation. For instance, a field of type `i32` could have a + niche of `-1` or equivalently `0x8000_0000`. +- A built-in floating-point type (fN). In this case, the niche specification + may use a floating-point number, or an unsigned integer corresponding to the + IEEE representation of that floating-point type. +- A `char`. In this case, the niche specification may use a `char` literal, or + an unsigned integer. The niche gets merged with the built-in niches of + `char`; if the result after merging would have multiple discontiguous niches, + the compiler need not take all of them into account. +- A raw pointer. In this case, the niche specification must use an unsigned + integer. This allows user-defined types to store a properly typed pointer + while using known-invalid pointer values as niches. Declaring a niche on a struct whose field type does not meet these restrictions results in an error. @@ -167,15 +186,8 @@ suppressed for code expanded from a macro. Declaring a range niche with an invalid range (e.g. `5..0`) results in an error. -Declaring a niche using a negative value or a negative range endpoint results -in an error. The representation of negative values depends on the size of the -type, and the compiler may not have that information at the time it handles -attributes such as `niche`. The text of the error should suggest the -appropriate two's-complement unsigned equivalent to use. The compiler may -support this in the future. - -Declaring a range niche with an open start (`..3`) results in an error, for -forwards-compatibility with support for negative values. +Declaring a range niche with an unbounded range (`..`) results in an error, as +this would represent a field with no valid values. Declaring a niche using a non-literal value (e.g. `usize::MAX`) results in an error. Constants can use compile-time evaluation, and compile-time evaluation @@ -356,10 +368,6 @@ and innovation since computing antiquity. Could we support niches on generic types? For instance, could we support declaring a niche of `0` on a generic struct with a single field? -Could we support negative numbers in a niche attribute, at least for fields of -concrete primitive type? That would provide a much more friendly interface, but -would require the compiler to better understand the type and its size. - Are there any attributes we need to make mutually exclusive with `niche`? Can we make `derive(Default)` detect errors? The compiler already has support @@ -378,11 +386,6 @@ specify only behavior that the Rust compiler already implements. New types of niches can use the same `niche` attribute, adding new key-values within the attribute. -- **Signed values**: This RFC requires the use of unsigned values when defining - niches. A future version could permit the use of signed values, to avoid - having to manually perform the twos-complement conversion. This may - require either making the compiler's implementation smarter, or using a - syntax that defines the size of the integer type (e.g. `-1isize`). - **Limited constant evaluation**: This RFC excludes the possibility of using constants in the range expression, because doing so simplifies the implementation. Ideally, a future version would allow ranges to use at least @@ -424,9 +427,8 @@ within the attribute. lifetime other than `'static`, this will also require at least some support for generic parameters. - **Non-primitive fields**: A struct could contain fields of non-primitive - types, such as tuples, arrays, or other structs (including structs with - niches themselves). This should wait until after niches support providing - values with the type of the field, rather than as an unsigned integer. + types, such as enums, tuples, arrays, or other structs (including structs + with niches themselves). - **Whole-struct niches**: A struct containing multiple non-zero-sized fields could have niche values for the whole struct. - **Union niches**: A union could have a niche. From 08244e08ddfa4ca80895dfe7fff16f3425c46c00 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Wed, 26 Oct 2022 22:22:39 +0100 Subject: [PATCH 49/49] Rephrase descriptions of ranges --- text/3334-niche.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/3334-niche.md b/text/3334-niche.md index dddd54f66a1..fbce77964f0 100644 --- a/text/3334-niche.md +++ b/text/3334-niche.md @@ -130,11 +130,11 @@ representation of the field. For instance, a struct with a float field could specify one or more NaN values as a niche using the integer representation of those values. -The range may be exclusive (`start..end`), inclusive (`start..=end`), -open-ended (`start..`), open-start (`..end`), or open-start inclusive +The range may be exclusive (`start..end`), inclusive (`start..=end`), bounded +below (`start..`), bounded above (`..end`), or bounded above inclusive (`..=end`). -Note that an open-start range on a signed field or floating-point field will +Note that on a signed or floating-point field, a range bounded only above will include all values less than the upper bound, including any negative numbers less than the upper bound. For instance, a field of type `i8` with a niche range of `..2` will have as niche values `1`, `0`, `-1`, `-2`, ..., `-128`.