Skip to content

Commit dcf0963

Browse files
committed
work on char/str descriptions
1 parent 892b928 commit dcf0963

File tree

1 file changed

+12
-7
lines changed

1 file changed

+12
-7
lines changed

src/types/textual.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,20 @@
22

33
The types `char` and `str` hold textual data.
44

5-
A value of type `char` is a [Unicode scalar value] (i.e. a code point that
6-
is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to
7-
0xD7FF or 0xE000 to 0x10FFFF range. A `[char]` is effectively a UCS-4 / UTF-32
8-
string.
5+
A value of type `char` is a [Unicode scalar value] (i.e. a code point that is
6+
not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF
7+
or 0xE000 to 0x10FFFF range. It is immediate [Undefined Behavior] to create a
8+
`char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32
9+
string of length 1.
910

1011
A value of type `str` is a Unicode string, represented as an array of 8-bit
11-
unsigned bytes holding a sequence of UTF-8 code points. Since `str` is a
12-
[dynamically sized type], it is not a _first-class_ type, but can only be
13-
instantiated through a pointer type, such as `&str`.
12+
unsigned bytes holding a sequence of UTF-8 code points. Note that this is a
13+
library-level invariant: for the compiler and core language specification, `str`
14+
is the same as `[u8]`, but methods working on `str` may assume that the data in
15+
there is valid UTF-8 and may cause Undefined Behavior otherwise. Since `str` is
16+
a [dynamically sized type], it can only be instantiated through a pointer type,
17+
such as `&str`.
1418

1519
[Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value
20+
[Undefined Behavior]: ../behavior-considered-undefined.html
1621
[dynamically sized type]: ../dynamically-sized-types.md

0 commit comments

Comments
 (0)