Skip to content

Commit 8d548d4

Browse files
committed
Add exotic codepoint detection and mixed script lints
1 parent 935c917 commit 8d548d4

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

text/0000-non-ascii-idents.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,22 @@ The confusable detection algorithm is based on [Unicode® Technical Standard #39
109109

110110
Note: A fast way to implement this is to compute `skeleton` for each identifier once and place the result in a hashmap as a key. If one tries to insert a key that already exists check if the two identifiers differ from each other. If so report the two confusable identifiers.
111111

112+
## Exotic codepoint detection
113+
114+
A new `less_used_codepoints` lint is added to the compiler. The default setting is to `warn`.
115+
116+
The lint is triggered by identifiers that contain a codepoint that is not part of the set of "Allowed" codepoints as described by [Unicode® Technical Standard #39 Unicode Security Mechanisms Section 3.1 General Security Profile for Identifiers][TR39Allowed].
117+
118+
Note: New Unicode versions update the set of allowed codepoints. Additionally the compiler authors may decide to allow more codepoints or warn about those that have been found to cause confusion.
119+
120+
## Mixed script detection
121+
122+
A new `mixed_script_idents` lint is added to the compiler. The default setting is to `warn`.
123+
124+
The lint is triggered by identifiers that do not qualify for the "Moderately Restrictive" identifier profile specified in [Unicode® Technical Standard #39 Unicode Security Mechanisms Section 5.2 Restriction-Level Detection][TR39RestrictionLevel].
125+
126+
Note: The definition of "Moderately Restrictive" can be changed by future versions of the Unicode standard to reflect changes in the natural languages used or for other reasons.
127+
112128
## Adjustments to the "bad style" lints
113129

114130
Rust [RFC 0430] establishes naming conventions for Rust ASCII identifiers. The *rustc* compiler includes lints to promote these recommendations.
@@ -198,6 +214,7 @@ The [Go language][Go] allows identifiers in the form **Letter (Letter | Number)\
198214
* Should [ZWNJ and ZWJ be allowed in identifiers][TR31Layout]?
199215
* How are non-ASCII idents best supported in debuggers?
200216
* Which name mangling scheme is used by the compiler?
217+
* Is there a better name for the `less_used_codepoints` lint?
201218

202219
[PEP 3131]: https://www.python.org/dev/peps/pep-3131/
203220
[UAX31]: http://www.unicode.org/reports/tr31/
@@ -213,3 +230,5 @@ The [Go language][Go] allows identifiers in the form **Letter (Letter | Number)\
213230
[Go]: https://golang.org/ref/spec#Identifiers
214231
[Composed characters]: https://en.wikipedia.org/wiki/Precomposed_character
215232
[RFC 0430]: http://rust-lang.github.io/rfcs/0430-finalizing-naming-conventions.html
233+
[TR39Allowed]: https://www.unicode.org/reports/tr39/#General_Security_Profile
234+
[TR39RestrictionLevel]: https://www.unicode.org/reports/tr39/#Restriction_Level_Detection

0 commit comments

Comments
 (0)