Skip to content

Hint on unknown escape of Unicode quotation marks in string literal #128906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions compiler/rustc_parse/src/lexer/unescape_error_reporting.rs
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,13 @@ pub(crate) fn emit_unescape_error(
"this is an isolated carriage return; consider checking your editor and \
version control settings",
);
} else if looks_like_quote(c) {
diag.help(format!(
"{ec} is not an ascii quote, \
but may look like one in some fonts.\n\
consider writing it in its \
escaped form for clarity."
Comment on lines +163 to +166
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the spacing is a bit awkward here. Can we use a semicolon to turn this into one line?

Suggested change
"{ec} is not an ascii quote, \
but may look like one in some fonts.\n\
consider writing it in its \
escaped form for clarity."
"{ec} is not an ascii quote but may look like one in some fonts; consider escaping it to avoid confusion"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm pretty sure that would put it over the 100 char line length limit.

i'm also unsure if you have a problem with the formatting of the output or the code (the code is 4 lines, but the actual output is only 2)

Copy link
Member

@compiler-errors compiler-errors Aug 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output -- we don't typically have multi-line diagnostics (unless formatting a list or something), and we try to avoid periods in diagnostic outputs as a matter of style. I personally find multi-sentence notes to be a bit wordy.

Yeah, you'll need to re-\ the string literal.

));
} else {
if mode == Mode::Str || mode == Mode::Char {
diag.span_suggestion(
Expand Down Expand Up @@ -295,3 +302,17 @@ pub(crate) fn escaped_char(c: char) -> String {
_ => c.escape_default().to_string(),
}
}

/// Returns true if `c` may look identical to `"` in some fonts.
fn looks_like_quote(c: char) -> bool {
// list of homoglyphs generated using the following wikidata query:
// SELECT ?u WHERE {
// wd:Q87495536 wdt:P2444+ ?c.
// ?c wdt:P4213 ?u.
// }
match c {
'\u{2033}' | '\u{02BA}' | '\u{02DD}' | '\u{030B}' | '\u{030E}' | '\u{05F4}'
| '\u{201C}' | '\u{201D}' => true,
_ => false,
}
}
3 changes: 3 additions & 0 deletions tests/ui/unicode-quote.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
fn main() {
dbg!("since when is \“THIS\” not allowed in a string literal");
}
20 changes: 20 additions & 0 deletions tests/ui/unicode-quote.stderr
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
error: unknown character escape: `\u{201c}`
--> $DIR/unicode-quote.rs:2:26
|
LL | dbg!("since when is \“THIS\” not allowed in a string literal");
| ^ unknown character escape
|
= help: \u{201c} is not an ascii quote, but may look like one in some fonts.
consider writing it in its escaped form for clarity.
Comment on lines +7 to +8
Copy link
Member

@jieyouxu jieyouxu Aug 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: the help message itself could look like

help: U+201C Left Double Quotation Mark (“) looks like U+0022 Quotation Mark (") (ASCII quotation mark) but are different characters

and a MaybeIncorrect suggestion as mentioned could look something like

help: consider writing the Unicode escape
  |
2 |     dbg!("since when is \u{201c}THIS\” not allowed in a string literal");
  |                         ++++++++

(exact wording may vary)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does rustc have a database of unicode character names?


error: unknown character escape: `\u{201d}`
--> $DIR/unicode-quote.rs:2:32
|
LL | dbg!("since when is \“THIS\” not allowed in a string literal");
| ^ unknown character escape
|
= help: \u{201d} is not an ascii quote, but may look like one in some fonts.
consider writing it in its escaped form for clarity.

error: aborting due to 2 previous errors

Loading