Skip to content

Commit 5a5b7ef

Browse files
authored
[spec/arrays] Improve string docs (#3761)
* [spec/arrays] Improve string docs Minor tweaks. Move UTF-8 etc to Unicode section & make table showing alias names. Add 3 subheadings. Move pointer to char paragraphs to zero-termination info, add example. Add links to druntime/phobos functions. * Rework string overview Link to immutable. Improve`dup` example. Use string alias then explain it. Don't repeat example. Make string op example runnable. * Fix string literal link * Fix Bugzilla Issue 24357 - String spec needs updating
1 parent 25ab0e8 commit 5a5b7ef

File tree

1 file changed

+95
-75
lines changed

1 file changed

+95
-75
lines changed

spec/arrays.dd

Lines changed: 95 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -985,7 +985,7 @@ $(H3 $(LNAME2 default-initialization, Default Initialization))
985985
)
986986

987987
$(H3 $(LNAME2 length-initialization, Length Initialization))
988-
$(P The $(D new) expression can be used to start a dynamic array
988+
$(P The $(D new) expression can be used to allocate a dynamic array
989989
with a specified length by specifying its type and then using the
990990
`(size)` syntax:
991991
)
@@ -1062,111 +1062,85 @@ $(H2 $(LNAME2 special-array, Special Array Types))
10621062

10631063
$(H3 $(LNAME2 strings, Strings))
10641064

1065-
$(P A string is
1066-
an array of characters. String literals are just
1067-
an easy way to write character arrays.
1068-
String literals are immutable (read only).
1065+
$(P A string is an array of $(DDSUBLINK spec/const3, immutable_type, immutable)
1066+
(read-only) characters. String literals essentially are
1067+
an easy way to write character array literals.
10691068
)
10701069

1071-
$(SPEC_RUNNABLE_EXAMPLE_FAIL
1072-
---------
1073-
char[] str1 = "abc"; // error, cannot implicitly convert expression `"abc"` of type `string` to `char[]`
1074-
char[] str2 = "abc".dup; // ok, makes mutable copy
1075-
immutable(char)[] str3 = "abc"; // ok
1076-
immutable(char)[] str4 = str2; // error, cannot implicitly convert expression `str2` of type `char[]` to `string`
1077-
immutable(char)[] str5 = str3; // ok, makes a mutable str5 with immutable aray contents
1078-
immutable(char)[] str6 = str2.idup; // ok, makes immutable copy
1070+
$(SPEC_RUNNABLE_EXAMPLE_RUN
10791071
---------
1080-
)
1072+
char[] arr;
1073+
//arr = "abc"; // error, cannot implicitly convert expression `"abc"` of type `string` to `char[]`
1074+
arr = "abc".dup; // ok, allocates mutable copy
10811075

1082-
$(P The name $(CODE string) is aliased to $(CODE immutable(char)[]),
1083-
so the above declarations could be equivalently written as:
1084-
)
1076+
string str1 = "abc"; // ok, same types
1077+
//str1 = arr; // error, cannot implicitly convert expression `arr` of type `char[]` to `string`
1078+
str1 = arr.idup; // ok, allocates an immutable copy of elements
1079+
assert(str1 == "abc");
10851080

1086-
$(SPEC_RUNNABLE_EXAMPLE_FAIL
1087-
---------
1088-
char[] str1 = "abc"; // error, cannot implicitly convert expression `"abc"` of type `string` to `char[]`
1089-
char[] str2 = "abc".dup; // ok, makes mutable copy
1090-
string str3 = "abc"; // ok
1091-
string str4 = str2; // error, cannot implicitly convert expression `str2` of type `char[]` to `string`
1092-
string str5 = str3; // ok, makes a mutable str5 with immutable aray contents
1093-
string str6 = str2.idup; // ok, makes immutable copy
1081+
string str2 = str1; // ok, mutable slice of same immutable array contents
10941082
---------
10951083
)
10961084

1097-
$(P
1085+
$(P The name $(CODE string) is aliased to $(CODE immutable(char)[]).
10981086
The type $(D immutable(char)[]) represents an array of $(D immutable char)s. However, the reference to the string is
1099-
mutable. If the the reference to the string needs to be immutable as well it can be declared $(D immutable char[])
1100-
or $(D immutable string):
1087+
mutable.
11011088
)
1089+
---
1090+
immutable(char)[] s = "foo";
1091+
s[0] = 'a'; // error, s[0] is immutable
1092+
s = "bar"; // ok, s itself is not immutable
1093+
---
11021094

1095+
$(P If the reference to the string needs to be immutable as well, it can be declared $(D immutable char[])
1096+
or $(D immutable string):
1097+
)
11031098
---
11041099
immutable char[] s = "foo";
11051100
s[0] = 'a'; // error, s refers to immutable data
11061101
s = "bar"; // error, s is immutable
1107-
1108-
immutable(char)[] s = "hello";
1109-
s[0] = 'b'; // error, s[] is immutable
1110-
s = null; // ok, s itself is not immutable
11111102
---
11121103

1113-
$(P $(CODE char[]) strings are in UTF-8 format.
1114-
$(CODE wchar[]) strings are in UTF-16 format.
1115-
$(CODE dchar[]) strings are in UTF-32 format.
1116-
)
1117-
11181104
$(P Strings can be copied, compared, concatenated, and appended:)
11191105

1106+
$(SPEC_RUNNABLE_EXAMPLE_RUN
11201107
---------
1121-
str1 = str2;
1122-
if (str1 < str3) { ... }
1123-
func(str3 ~ str4);
1124-
str4 ~= str1;
1108+
string s1;
1109+
immutable s2 = "ello";
1110+
s1 = s2;
1111+
s1 = "h" ~ s1;
1112+
if (s1 > "farro")
1113+
s1 ~= " there";
1114+
1115+
assert(s1 == "hello there");
11251116
---------
1117+
)
11261118

1127-
$(P with the obvious semantics. Any generated temporaries get cleaned up
1119+
$(P with array semantics. Any generated temporaries get cleaned up
11281120
by the garbage collector (or by using $(CODE alloca())).
11291121
Not only that, this works with any
11301122
array not just a special String array.
11311123
)
11321124

1133-
$(P A pointer to a char can be generated:
1134-
)
1125+
$(H4 $(LNAME2 string-literal-types, String Literal Types))
11351126

1136-
---------
1137-
char* p = &str[3]; // pointer to 4th element
1138-
char* p = str; // pointer to 1st element
1139-
---------
1140-
1141-
$(P Since strings, however, are not 0 terminated in D,
1142-
when transferring a pointer
1143-
to a string to C, add a terminating 0:
1144-
)
1145-
1146-
---------
1147-
str ~= "\0";
1148-
---------
1149-
1150-
or use the function $(D std.string.toStringz).
1151-
1152-
$(P The type of a string is determined by the semantic phase of
1153-
compilation. The type is
1154-
one of: char[], wchar[], dchar[], and is determined by
1155-
implicit conversion rules.
1127+
$(P The type of a $(DDSUBLINK spec/expression, string_literals, string literal)
1128+
is determined by the semantic phase of compilation. The type is
1129+
determined by implicit conversion rules.
11561130
If there are two equally applicable implicit conversions,
11571131
the result is an error. To
11581132
disambiguate these cases, a cast or a postfix of $(D c),
11591133
$(D w) or $(D d) can be used:
11601134
)
11611135

11621136
---------
1163-
cast(immutable(wchar) [])"abc" // this is an array of wchar characters
1137+
cast(immutable(wchar)[]) "abc" // this is an array of wchar characters
11641138
"abc"w // so is this
11651139
---------
11661140

11671141
$(P String literals that do not have a postfix character and that
11681142
have not been cast can be implicitly converted between
1169-
string, wstring, and dstring as necessary.
1143+
`string`, `wstring`, and `dstring` (see below) as necessary.
11701144
)
11711145

11721146
$(SPEC_RUNNABLE_EXAMPLE_COMPILE
@@ -1188,6 +1162,16 @@ void fun()
11881162
)
11891163

11901164
$(H4 $(LEGACY_LNAME2 strings_unicode, strings-unicode, Strings and Unicode))
1165+
1166+
$(P String data is encoded as follows:)
1167+
1168+
$(TABLE2,
1169+
$(THEAD Alias, Type, Encoding)
1170+
$(TROW `string`, $(CODE immutable(char)[]), UTF-8)
1171+
$(TROW `wstring`, $(CODE immutable(wchar)[]), UTF-16)
1172+
$(TROW `dstring`, $(CODE immutable(dchar)[]), UTF-32)
1173+
)
1174+
11911175
$(P Note that built-in comparison operators operate on a
11921176
$(LINK2 http://www.unicode.org/glossary/#code_unit, code unit) basis.
11931177
The end result for valid strings is the same as that of
@@ -1209,12 +1193,48 @@ $(H4 $(LEGACY_LNAME2 strings_unicode, strings-unicode, Strings and Unicode))
12091193
that should be implemented in the standard library.
12101194
)
12111195

1212-
$(H4 $(LNAME2 printf, C's printf() and Strings))
1196+
$(H4 $(LNAME2 char-pointers, Character Pointers and C strings))
1197+
1198+
$(P A pointer to a character can be generated:
1199+
)
1200+
1201+
$(SPEC_RUNNABLE_EXAMPLE_RUN
1202+
---------
1203+
string str = "abcd";
1204+
immutable(char)* p = &str[3]; // pointer to 4th element
1205+
assert(*p == 'd');
1206+
p = str.ptr; // pointer to 1st element
1207+
assert(*p == 'a');
1208+
---------
1209+
)
1210+
1211+
$(P Only string *literals* are zero-terminated in D.
1212+
In general, when transferring a pointer
1213+
to string data to C, append a terminating `'\0'`:
1214+
)
1215+
1216+
$(SPEC_RUNNABLE_EXAMPLE_RUN
1217+
---------
1218+
string str = "ab";
1219+
assert(str.ptr[2] == '\0'); // OK
1220+
str ~= "cd";
1221+
// str is no longer zero-terminated
1222+
str ~= "\0";
1223+
assert(str[4] == '\0'); // OK
1224+
str.length = 2;
1225+
// str is no longer correctly zero-terminated
1226+
assert(str.ptr[2] != '\0');
1227+
---------
1228+
)
1229+
1230+
The function $(REF toStringz, std,string) can also be used.
1231+
1232+
$(H4 $(LNAME2 printf, Example: `printf`))
12131233

1214-
$(P $(D printf()) is a C function and is not part of D. $(D printf())
1234+
$(P $(REF printf, core,stdc,stdio) is a C function and is not part of D. $(D printf())
12151235
will print C strings, which are 0 terminated. There are two ways
12161236
to use $(D printf()) with D strings. The first is to add a
1217-
terminating 0, and cast the result to a char*:
1237+
terminating 0:
12181238
)
12191239

12201240
---------
@@ -1236,12 +1256,12 @@ printf("the string is '%s'\n", std.string.toStringz(str));
12361256
printf("the string is '%s'\n", "string literal".ptr);
12371257
-----------
12381258

1239-
$(P So, why does the first string literal to printf not need
1259+
$(P So, why does the first string literal to `printf` not need
12401260
the `.ptr`? The first parameter is prototyped as a `const(char)*`, and
1241-
a string literal can be implicitly `cast` to a `const(char)*`.
1242-
The rest of the arguments to printf, however, are variadic
1243-
(specified by ...),
1244-
and a string literal typed `immutable(char)[]` cannot pass
1261+
a string literal can be implicitly converted to a `const(char)*`.
1262+
The rest of the arguments to `printf`, however, are variadic
1263+
(specified by `...`),
1264+
and a string literal typed `immutable(char)[]` cannot be passed
12451265
to variadic parameters.)
12461266

12471267
$(P The second way is to use the precision specifier.
@@ -1251,7 +1271,7 @@ printf("the string is '%s'\n", "string literal".ptr);
12511271
printf("the string is '%.*s'\n", cast(int)str.length, str.ptr);
12521272
---------
12531273

1254-
$(P The best way is to use std.stdio.writefln, which can handle
1274+
$(P The best way is to use $(REF writefln, std,stdio), which can handle
12551275
D strings:)
12561276

12571277
---------

0 commit comments

Comments
 (0)