Move value definitions to appropriate chapters under types.

chorman0773 · chorman0773 · commit d6b6744d898c · 2024-11-21T15:17:41.000-05:00
diff --git a/src/memory-model.md b/src/memory-model.md
@@ -1,5 +1,45 @@
 # Memory model
 
-Rust does not yet have a defined memory model. Various academics and industry professionals
-are working on various proposals, but for now, this is an under-defined place
-in the language.
+r[memory]
+
+The Memory Model of Rust is incomplete and not fully decided. The following is some of the detail worked out so far.
+
+## Bytes
+
+r[memory.byte]
+
+r[memory.byte.intro]
+The most basic unit of memory in Rust is a byte. All values in Rust are computed from 0 or more bytes read from an allocation.
+
+> [!NOTE]
+> While bytes in Rust are typically lowered to hardware bytes, they may contain additional values,
+> such as being uninitialized, or storing part of a pointer.
+
+r[memory.byte.init]
+Each byte may be initialized, and contain a value of type `u8`, as well as an optional pointer fragment. When present, the pointer fragment carries [provenance][type.pointer.provenance] information.
+
+r[memory.byte.uninit]
+Each byte may be uninitialized.
+
+> [!NOTE]
+> Uninitialized bytes do not have a value and do not have a pointer fragment.
+
+## Value Encoding
+
+r[memory.encoding]
+
+r[memory.encoding.intro]
+Each type in Rust has 0 or more values, which can have operations performed on them
+
+> [!NOTE]
+> `0u8`, `1337i16`, and `Foo{bar: "baz"}` are all values
+
+r[memory.encoding.op]
+Each value of a type can be encoded into a sequence of bytes, and decoded from a sequence of bytes, which has a length equal to the size of the type.
+The operation to encode or decode a value is determined by the representation of the type.
+
+> [!NOTE]
+> Representation is related to, but is not the same property as, the layout of the type.
+
+r[memory.encoding.decode]
+If a value of type `T` is decoded from a sequence of bytes that does not correspond to a defined value, the behavior is undefined. If a value of type `T` is decoded from a sequence of bytes that contain pointer fragments, which are not used to represent the value, the pointer fragments are ignored.
diff --git a/src/types/boolean.md b/src/types/boolean.md
@@ -21,9 +21,10 @@ r[type.bool.layout]
 An object with the boolean type has a [size and alignment] of 1 each.
 
 r[type.bool.repr]
-The value false has the bit pattern `0x00` and the value true has the bit pattern
-`0x01`. It is [undefined behavior] for an object with the boolean type to have
-any other bit pattern.
+A `bool` is represented as a single initialized byte with a value of `0x00` corresponding to `false` and a value of `0x01` corresponding to `true`. This byte does not have a pointer fragment.
+
+> [!NOTE]
+> No other representations are valid for `bool`. Undefined Behaviour occurs when any other byte is read as type `bool`.
 
 r[type.bool.usage]
 The boolean type is the type of many operands in various [expressions]:
diff --git a/src/types/function-pointer.md b/src/types/function-pointer.md
@@ -55,6 +55,13 @@ let bo: Binop = add;
 x = bo(5,7);
 ```
 
+r[type.fn-pointer.value]
+A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer.
+
+> [!NOTE]
+> Whether or not a function pointer value has provenance, and whether or not this provenance is represented as pointer fragments, is not yet decided.
+
+
 ## Attributes on function pointer parameters
 
 r[type.fn-pointer.attributes]
diff --git a/src/types/numeric.md b/src/types/numeric.md
@@ -29,6 +29,7 @@ Type   | Minimum            | Maximum
 `i128` | -(2<sup>127</sup>) | 2<sup>127</sup>-1
 
 
+
 ## Floating-point types
 
 r[type.numeric.float]
@@ -65,3 +66,37 @@ r[type.numeric.validity]
 
 For every numeric type, `T`, the bit validity of `T` is equivalent to the bit
 validity of `[u8; size_of::<T>()]`. An uninitialized byte is not a valid `u8`.
+
+## Representation
+
+r[type.numeric.repr]
+
+r[type.numeric.repr.integer]
+Each value of an integer type is a whole number. For unsigned types, this is a positive integer or `0`. For signed types, this can either be a positive integer, negative integer, or `0`.
+
+r[type.numeric.repr.integer-width]
+The range of values an integer type can represent depends on its signedness and its width, in bits. The width of type `uN` or `iN` is `N`. The width of type `usize` or `isize` is the value of the `target_pointer_width` property.
+
+> [!NOTE]
+> There are exactly `1<<N` unique values of an integer type of width `N`.
+
+r[type.numeric.repr.unsigned]
+A value `i` of an unsigned integer type `U` is represented by a sequence of initialized bytes, where the `m`th successive byte according to the byte order of the platform is `(i >> (m*8)) as u8`, where `m` is between `0` and the size of `U`. None of the bytes produced by encoding an unsigned integer has a pointer fragment.
+
+> [!NOTE]
+> The two primary byte orders are `little` endian, where the bytes are ordered from lowest memory address to highest, and `big` endian, where the bytes are ordered from highest memory address to lowest.
+> The `cfg` predicate `target_endian` indicates the byte order
+
+> [!WARN]
+> On `little` endian, the order of bytes used to decode an integer type is the same as the natural order of a `u8` array - that is, the `m` value corresponds with the `m` index into a same-sized `u8` array. On `big` endian, however, the order is the opposite of this order - that is, the `m` value corresponds with the `size_of::<T>() - m` index in that array.
+
+r[type.numeric.repr.signed]
+A value `i` of a signed integer type with width `N` is represented the same as the corresponding value of the unsigned counterpart type which is congruent modulo `2^N`.
+
+r[type.numeric.repr.float]
+A floating-point value is represented the same as a value of the unsigned integer type with the same width given by its [IEEE 754-2019] encoding.
+
+r[type.numeric.repr.float-format]
+The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`.
+
+[IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229
diff --git a/src/types/pointer.md b/src/types/pointer.md
@@ -81,19 +81,61 @@ r[type.pointer.smart]
 
 The standard library contains additional 'smart pointer' types beyond references and raw pointers.
 
-## Bit validity
+## Pointer values and representation
 
-r[type.pointer.validity]
+r[type.pointer.value]
 
-r[type.pointer.validity.pointer-fragment]
-Despite pointers and references being similar to `usize`s in the machine code emitted on most platforms,
-the semantics of transmuting a reference or pointer type to a non-pointer type is currently undecided.
-Thus, it may not be valid to transmute a pointer or reference type, `P`, to a `[u8; size_of::<P>()]`.
+r[type.pointer.value.thin]
+Each thin pointer consists of an address and an optional [provenance][type.pointer.provenance]. The address refers to which byte the pointer points to. The provenance refers to which bytes the pointer is allowed to access, and the allocation those bytes are within.
+
+> [!NOTE]
+> A pointer that does not have a provenance may be called an invalid or dangling pointer.
+
+r[type.pointer.value.thin-repr]
+The representation of a value of a thin pointer is a sequence of initialized bytes with `u8` values given by the representation of its address as a value of type `usize`, and pointer fragments corresponding to its provenance, if present.
+
+r[type.pointer.value.thin-ref]
+A thin reference to `T` consists of a non-null, well aligned address, and provenance for `size_of::<T>()` bytes starting from that address. The representation of a thin reference to `T` is the same as the pointer with the same address and provenance.
+
+> [!NOTE]
+> This is true for both shared and mutable references. There are additional constraints enforced by the aliasing model that are not yet fully decided.
+
+r[type.pointer.value.wide]
+A wide pointer or reference consists of a data pointer or reference, and a pointee-specific metadata value.
+
+r[type.pointer.value.wide-reference]
+The data pointer of a wide reference has a non-null address, well aligned for `align_of_val(self)`, and with provenance for `size_of_val(self)` bytes.
+
+r[type.pointer.value.wide-representation]
+A wide pointer or reference is represented the same as `struct WidePointer<M>{data: *mut (), metadata: M}` where `M` is the pointee metadata type, and the `data` and `metadata` fields are the corresponding parts of the pointer.
+
+> [!NOTE]
+> The `WidePointer` struct has no guarantees about layout, and has the default representation.
+
+
+## Pointer Provenance
+
+r[type.pointer.provenance]
+
+r[type.pointer.provenance.intro]
+Pointer Provenance is a term that refers to additional data carried by pointer values in Rust, beyond its address. When stored in memory, Provenance is encoded in the Pointer Fragment part of each byte of the pointer.
+
+r[type.pointer.provenance.allocation]
+Whenever a pointer to a particular allocation is produced by using the reference or raw reference operators, or when a pointer is returned from an allocation function, the resulting pointer has provenance that refers to that allocation.
+
+> [!NOTE]
+> There is additional information encoded by provenance, but the exact scope of this information is not yet decided.
+
+r[type.pointer.provenance.dangling]
+A pointer is dangling if it has no provenance, or if it has provenance to an allocation that has since been deallocated. An access, except for an access of size zero, using a dangling pointer, is undefined behavior.
+
+> [!NOTE]
+> Allocations include local and static variables, as well as temporaries. Local Variables and Temporaries are deallocated when they go out of scope.
+
+> [!WARN]
+> The above is necessary, but not sufficient, to avoid undefined behavior. The full requirements for pointer access is not yet decided.
+> A reference obtained in safe code is guaranteed to be valid for its usable lifetime, unless interfered with by unsafe code.
 
-r[type.pointer.validity.raw]
-For thin raw pointers (i.e., for `P = *const T` or `P = *mut T` for `T: Sized`),
-the inverse direction (transmuting from an integer or array of integers to `P`) is always valid.
-However, the pointer produced via such a transmutation may not be dereferenced (not even if `T` has size zero).
 
 [Interior mutability]: ../interior-mutability.md
 [_Lifetime_]: ../trait-bounds.md
diff --git a/src/types/textual.md b/src/types/textual.md
@@ -10,10 +10,13 @@ A value of type `char` is a [Unicode scalar value] (i.e. a code point that is
 not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF
 or 0xE000 to 0x10FFFF range.
 
-r[type.text.char-precondition]
-It is immediate [undefined behavior] to create \1
-`char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32
-string of length 1.
+> [!NOTE]
+> It is immediate [undefined behavior] to create a
+> `char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32
+> string of length 1.
+
+r[type.text.char-repr]
+A value of type `chart` is represented as the value of type `u32` with value equal to the code point that it represents.
 
 r[type.text.str-value]
 A value of type `str` is represented the same way as `[u8]`, a slice of
diff --git a/src/values.md b/src/values.md
@@ -2,155 +2,6 @@
 
 r[value]
 
-## Bytes
-
-r[value.byte]
-
-r[value.byte.intro]
-The most basic unit of memory in Rust is a byte. All values in Rust are computed from 0 or more bytes read from an allocation.
-
-> [!NOTE]
-> While bytes in Rust are typically lowered to hardware bytes, they may contain additional values,
-> such as being uninitialized, or storing part of a pointer.
-
-r[value.byte.init]
-Each byte may be initialized, and contain a value of type `u8`, as well as an optional pointer fragment.
-
-r[value.byte.uninit]
-Each byte may be uninitialized.
-
-> [!NOTE]
-> Uninitialized bytes do not have a value and do not have a pointer fragment.
-
-## Value Encoding
-
-r[value.encoding]
-
-r[value.encoding.intro]
-Each type in Rust has 0 or more values, which can have operations performed on them
-
-> [!NOTE]
-> `0u8`, `1337i16`, and `Foo{bar: "baz"}` are all values
-
-r[value.encoding.op]
-Each value of a type can be encoded into a sequence of bytes, and decoded from a sequence of bytes, which has a length equal to the size of the type.
-The operation to encode or decode a value is determined by the representation of the type.
-
-> [!NOTE]
-> Representation is related to, but is not the same property as, the layout of the type.
-
-r[value.encoding.decode]
-If a value of type `T` is decoded from a sequence of bytes that does not correspond to a defined value, the behavior is undefined. If a value of type `T` is decoded from a sequence of bytes that contain pointer fragments, which are not used to represent the value, the pointer fragments are ignored.
-
-## Pointer Provenance
-
-r[value.provenance]
-
-r[value.provenance.intro]
-Pointer Provenance is a term that refers to additional data carried by pointer values in Rust, beyond its address. When stored in memory, Provenance is encoded in the Pointer Fragment part of each byte of the pointer.
-
-r[value.provenance.allocation]
-Whenever a pointer to a particular allocation is produced by using the reference or raw reference operators, or when a pointer is returned from an allocation function, the resulting pointer has provenance that refers to that allocation.
-
-> [!NOTE]
-> There is additional information encoded by provenance, but the exact scope of this information is not yet decided.
-
-r[value.provenance.dangling]
-A pointer is dangling if it has no provenance, or if it has provenance to an allocation that has since been deallocated. An access, except for an access of size zero, using a dangling pointer, is undefined behavior.
-
-> [!NOTE]
-> Allocations include local and static variables, as well as temporaries. Local Variables and Temporaries are deallocated when they go out of scope.
-
-> [!WARN]
-> The above is necessary, but not sufficient, to avoid undefined behavior. The full requirements for pointer access is not yet decided.
-> A reference obtained in safe code is guaranteed to be valid for its usable lifetime, unless interfered with by unsafe code.
-
-## Primitive Values
-
-r[value.primitive]
-
-r[value.primitive.integer]
-Each value of an integer type is a whole number. For unsigned types, this is a positive integer or `0`. For signed types, this can either be a positive integer, negative integer, or `0`.
-
-r[value.primtive.integer-width]
-The range of values an integer type can represent depends on its signedness and its width, in bits. The width of type `uN` or `iN` is `N`. The width of type `usize` or `isize` is the value of the `target_pointer_width` property.
-
-r[value.primitive.integer-range]
-The range of an unsigned integer type of width `N` is between `0` and `1<<N - 1` inclusive. The range of a signed integer type of width `N` is between `-(1<<(N-1)` and `1<<(N-1) - 1` inclusive.
-
-> [!NOTE]
-> There are exactly `1<<N` unique values of an integer type of width `N`.
-
-r[value.primitive.unsigned-repr]
-A value `i` of an unsigned integer type `U` is represented by a sequence of initialized bytes, where the `m`th successive byte according to the byte order of the platform is `(i >> (m*8)) as u8`, where `m` is between `0` and the size of `U`. None of the bytes produced by encoding an unsigned integer has a pointer fragment.
-
-> [!NOTE]
-> The two primary byte orders are `little` endian, where the bytes are ordered from lowest memory address to highest, and `big` endian, where the bytes are ordered from highest memory address to lowest.
-> The `cfg` predicate `target_endian` indicates the byte order
-
-> [!WARN]
-> On `little` endian, the order of bytes used to decode an integer type is the same as the natural order of a `u8` array - that is, the `m` value corresponds with the `m` index into a same-sized `u8` array. On `big` endian, however, the order is the opposite of this order - that is, the `m` value corresponds with the `size_of::<T>() - m` index in that array.
-
-r[value.primitive.signed-repr]
-A value `i` of a signed integer type with width `N` is represented the same as the corresponding value of the unsigned counterpart type which is congruent modulo `2^N`.
-
-r[value.primitive.char]
-Each value of type `char` is a Unicode Scalar Value, between `U+0000` and `U+10FFFF` (excluding the surrogate range `U+D800` through `U+DFFF`).
-
-r[value.primitive.char-repr]
-The representation of type `char` is the same as the representation of the `u32` corresponding to the Code Point Number encoding by the `char`.
-
-r[value.primitive.bool]
-The two values of type `bool` are `true` and `false`. The representation of `true` is an initialized byte with value `0x01`, and the representation of `false` is an initialized  byte with value `0x00`. Neither value is represented with a pointer fragment.
-
-r[value.primitive.float]
-A floating-point value consists of either a rational number, which is within the range and precision dictated by the type, an infinity, or a NaN value.
-
-r[value.primitive.float-repr]
-A floating-point value is represented the same as a value of the unsigned integer type with the same width given by its [IEEE 754-2019] encoding.
-
-r[value.primitive.float-format]
-The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`.
-
-[IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229
-
-## Pointer Value
-
-r[value.pointer]
-
-r[value.pointer.thin]
-Each thin pointer consists of an address and an optional provenance. The address refers to which byte the pointer points to. The provenance refers to which bytes the pointer is allowed to access, and the allocation those bytes are within.
-
-> [!NOTE]
-> A pointer that does not have a provenance may be called an invalid or dangling pointer.
-
-r[value.pointer.thin-repr]
-The representation of a value of a thin pointer is a sequence of initialized bytes with `u8` values given by the representation of its address as a value of type `usize`, and pointer fragments corresponding to its provenance, if present.
-
-r[value.pointer.thin-ref]
-A thin reference to `T` consists of a non-null, well aligned address, and provenance for `size_of::<T>()` bytes starting from that address. The representation of a thin reference to `T` is the same as the pointer with the same address and provenance.
-
-> [!NOTE]
-> This is true for both shared and mutable references. There are additional constraints enforced by the aliasing model.
-
-r[value.pointer.wide]
-A wide pointer or reference consists of a data pointer or reference, and a pointee-specific metadata value.
-
-r[value.pointer.wide-reference]
-The data pointer of a wide reference has a non-null address, well aligned for `align_of_val(self)`, and with provenance for `size_of_val(self)` bytes.
-
-r[value.pointer.wide-representation]
-A wide pointer or reference is represented the same as `struct WidePointer<M>{data: *mut (), metadata: M}` where `M` is the pointee metadata type, and the `data` and `metadata` fields are the corresponding parts of the pointer.
-
-> [!NOTE]
-> The `WidePointer` struct has no guarantees about layout, and has the default representation.
-
-r[value.pointer.fn]
-A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer.
-
-> [!NOTE]
-> Whether or not a function pointer value has provenance, and whether or not this provenance is represented as pointer fragments, is not yet decided.
-
 ## Aggregate Values
 
 r[value.aggregate]