|
2 | 2 |
|
3 | 3 | The types `char` and `str` hold textual data.
|
4 | 4 |
|
5 |
| -A value of type `char` is a [Unicode scalar value] (i.e. a code point that |
6 |
| -is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to |
7 |
| -0xD7FF or 0xE000 to 0x10FFFF range. A `[char]` is effectively a UCS-4 / UTF-32 |
8 |
| -string. |
| 5 | +A value of type `char` is a [Unicode scalar value] (i.e. a code point that is |
| 6 | +not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF |
| 7 | +or 0xE000 to 0x10FFFF range. It is immediate [Undefined Behavior] to create a |
| 8 | +`char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32 |
| 9 | +string of length 1. |
9 | 10 |
|
10 | 11 | A value of type `str` is a Unicode string, represented as an array of 8-bit
|
11 |
| -unsigned bytes holding a sequence of UTF-8 code points. Since `str` is a |
12 |
| -[dynamically sized type], it is not a _first-class_ type, but can only be |
13 |
| -instantiated through a pointer type, such as `&str`. |
| 12 | +unsigned bytes holding a sequence of UTF-8 code points. Note that this is a |
| 13 | +library-level invariant: for the compiler and core language specification, `str` |
| 14 | +is the same as `[u8]`, but methods working on `str` may assume that the data in |
| 15 | +there is valid UTF-8 and may cause Undefined Behavior otherwise. Since `str` is |
| 16 | +a [dynamically sized type], it can only be instantiated through a pointer type, |
| 17 | +such as `&str`. |
14 | 18 |
|
15 | 19 | [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value
|
| 20 | +[Undefined Behavior]: ../behavior-considered-undefined.html |
16 | 21 | [dynamically sized type]: ../dynamically-sized-types.md
|
0 commit comments