Skip to content

Commit 493c797

Browse files
srutzkystevengj
authored andcommitted
Correct code point format in Base/Char/show function (#33291)
* Correct code point format in Base/Char/show function Two minor changes (both on line 307) to conform to the Unicode Standard. Unicode code points currently display with: 1. Lowercase letters, a - f, when present 2. A leading 0 for 5-digit code point values (i.e. 10000 - 9ffff) However, the Unicode Standard specifies that when using the "U+" notation, you should use: 1. Uppercase letters 2. Leading zeros only when the code point would have fewer than four digits (i.e. 0000 - 0FFF) For reference, the Unicode Standard (two versions to show consistency over time) * [(Version 12.1, 2019) Appendix A: Notational Conventions ⇒ Code Points](http://www.unicode.org/versions/Unicode12.0.0/appA.pdf) * [(Version 4.0.0, 2003) Preface: Notational Conventions ⇒ Code Points](http://www.unicode.org/versions/Unicode4.0.0/Preface.pdf) states: > In running text, an individual Unicode code point is expressed as U+n, where n is four to six hexadecimal digits, using the digits 0–9 and uppercase letters A–F (for 10 through 15, respectively). Leading zeros are omitted, unless the code point would have fewer than four hexadecimal digits—for example, U+0001, U+0012, U+0123, U+1234, U+12345, U+102345. * Add tests for U+ syntax formatting * Update code point format to match change in show() function * Update code point format to match change in show() function * Update code point format to match change in show() function * Update code point format to match change in show() function
1 parent da86a22 commit 493c797

File tree

6 files changed

+24
-15
lines changed

6 files changed

+24
-15
lines changed

base/char.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ function show(io::IO, ::MIME"text/plain", c::T) where {T<:AbstractChar}
304304
else
305305
u = codepoint(c)
306306
end
307-
h = string(u, base = 16, pad = u 0xffff ? 4 : 6)
307+
h = uppercase(string(u, base = 16, pad = 4))
308308
print(io, (isascii(c) ? "ASCII/" : ""), "Unicode U+", h)
309309
else
310310
print(io, ": Malformed UTF-8")

base/io.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ Read the entirety of `io`, as a `String`.
135135
julia> io = IOBuffer("JuliaLang is a GitHub organization");
136136
137137
julia> read(io, Char)
138-
'J': ASCII/Unicode U+004a (category Lu: Letter, uppercase)
138+
'J': ASCII/Unicode U+004A (category Lu: Letter, uppercase)
139139
140140
julia> io = IOBuffer("JuliaLang is a GitHub organization");
141141

base/iostream.jl

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ julia> io = IOBuffer("JuliaLang is a GitHub organization.");
100100
julia> seek(io, 5);
101101
102102
julia> read(io, Char)
103-
'L': ASCII/Unicode U+004c (category Lu: Letter, uppercase)
103+
'L': ASCII/Unicode U+004C (category Lu: Letter, uppercase)
104104
```
105105
"""
106106
function seek(s::IOStream, n::Integer)
@@ -122,12 +122,12 @@ julia> io = IOBuffer("JuliaLang is a GitHub organization.");
122122
julia> seek(io, 5);
123123
124124
julia> read(io, Char)
125-
'L': ASCII/Unicode U+004c (category Lu: Letter, uppercase)
125+
'L': ASCII/Unicode U+004C (category Lu: Letter, uppercase)
126126
127127
julia> seekstart(io);
128128
129129
julia> read(io, Char)
130-
'J': ASCII/Unicode U+004a (category Lu: Letter, uppercase)
130+
'J': ASCII/Unicode U+004A (category Lu: Letter, uppercase)
131131
```
132132
"""
133133
seekstart(s::IO) = seek(s,0)

base/strings/basic.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ julia> isvalid(str, 1)
107107
true
108108
109109
julia> str[1]
110-
'α': Unicode U+03b1 (category Ll: Letter, lowercase)
110+
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
111111
112112
julia> isvalid(str, 2)
113113
false

doc/src/manual/strings.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,8 @@ julia> isvalid(Char, 0x110000)
8888
false
8989
```
9090

91-
As of this writing, the valid Unicode code points are `U+00` through `U+d7ff` and `U+e000` through
92-
`U+10ffff`. These have not all been assigned intelligible meanings yet, nor are they necessarily
91+
As of this writing, the valid Unicode code points are `U+0000` through `U+D7FF` and `U+E000` through
92+
`U+10FFFF`. These have not all been assigned intelligible meanings yet, nor are they necessarily
9393
interpretable by applications, but all of these values are considered to be valid Unicode characters.
9494

9595
You can input any Unicode character in single quotes using `\u` followed by up to four hexadecimal
@@ -107,7 +107,7 @@ julia> '\u2200'
107107
'∀': Unicode U+2200 (category Sm: Symbol, math)
108108
109109
julia> '\U10ffff'
110-
'\U10ffff': Unicode U+10ffff (category Cn: Other, not assigned)
110+
'\U10ffff': Unicode U+10FFFF (category Cn: Other, not assigned)
111111
```
112112

113113
Julia uses your system's locale and language settings to determine which characters can be printed
@@ -173,10 +173,10 @@ julia> str[1]
173173
'H': ASCII/Unicode U+0048 (category Lu: Letter, uppercase)
174174
175175
julia> str[6]
176-
',': ASCII/Unicode U+002c (category Po: Punctuation, other)
176+
',': ASCII/Unicode U+002C (category Po: Punctuation, other)
177177
178178
julia> str[end]
179-
'\n': ASCII/Unicode U+000a (category Cc: Other, control)
179+
'\n': ASCII/Unicode U+000A (category Cc: Other, control)
180180
```
181181

182182
Many Julia objects, including strings, can be indexed with integers. The index of the first
@@ -192,7 +192,7 @@ a normal value:
192192

193193
```jldoctest helloworldstring
194194
julia> str[end-1]
195-
'.': ASCII/Unicode U+002e (category Po: Punctuation, other)
195+
'.': ASCII/Unicode U+002E (category Po: Punctuation, other)
196196
197197
julia> str[end÷2]
198198
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
@@ -223,7 +223,7 @@ Notice that the expressions `str[k]` and `str[k:k]` do not give the same result:
223223

224224
```jldoctest helloworldstring
225225
julia> str[6]
226-
',': ASCII/Unicode U+002c (category Po: Punctuation, other)
226+
',': ASCII/Unicode U+002C (category Po: Punctuation, other)
227227
228228
julia> str[6:6]
229229
","
@@ -416,7 +416,7 @@ julia> foreach(display, s)
416416
'\xc0\xa0': [overlong] ASCII/Unicode U+0020 (category Zs: Separator, space)
417417
'\xe2\x88': Malformed UTF-8 (category Ma: Malformed, bad data)
418418
'\xe2': Malformed UTF-8 (category Ma: Malformed, bad data)
419-
'|': ASCII/Unicode U+007c (category Sm: Symbol, math)
419+
'|': ASCII/Unicode U+007C (category Sm: Symbol, math)
420420
421421
julia> isvalid.(collect(s))
422422
4-element BitArray{1}:
@@ -429,7 +429,7 @@ julia> s2 = "\xf7\xbf\xbf\xbf"
429429
"\U1fffff"
430430
431431
julia> foreach(display, s2)
432-
'\U1fffff': Unicode U+1fffff (category In: Invalid, too high)
432+
'\U1fffff': Unicode U+1FFFFF (category In: Invalid, too high)
433433
```
434434

435435
We can see that the first two code units in the string `s` form an overlong encoding of

test/char.jl

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,3 +290,12 @@ end
290290
@testset "broadcasting of Char" begin
291291
@test identity.('a') == 'a'
292292
end
293+
294+
@testset "code point format of U+ syntax (PR 33291)" begin
295+
@test repr("text/plain", '\n') == "'\\n': ASCII/Unicode U+000A (category Cc: Other, control)"
296+
@test repr("text/plain", '/') == "'/': ASCII/Unicode U+002F (category Po: Punctuation, other)"
297+
@test repr("text/plain", '\u10e') == "'Ď': Unicode U+010E (category Lu: Letter, uppercase)"
298+
@test repr("text/plain", '\u3a2c') == "'㨬': Unicode U+3A2C (category Lo: Letter, other)"
299+
@test repr("text/plain", '\U001f428') == "'🐨': Unicode U+1F428 (category So: Symbol, other)"
300+
@test repr("text/plain", '\U010f321') == "'\\U10f321': Unicode U+10F321 (category Co: Other, private use)"
301+
end

0 commit comments

Comments
 (0)