-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Improved error recovery for unclosed strings (including f- and t-strings) #20848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# This is also true for | ||
# unterminated f-strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this looks silly, but keeping the comment over two lines reduces the snapshot changes.
bitflags! { | ||
#[derive(Clone, Copy, Debug, PartialEq, Eq)] | ||
pub(crate) struct TokenFlags: u8 { | ||
pub(crate) struct TokenFlags: u16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this increases the size of TokenFlags
, it doesn't increase the size of Token
. Which is why I didn't bother with any fancy encoding (e.g. it's unclosed if RAW_STRING_UPPERCASE
and RAW_STRING_LOWERCASE
are set)
|
} | ||
|
||
#[test] | ||
fn lex_fstring_unclsoed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn lex_fstring_unclsoed() { | |
fn lex_fstring_unclosed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah come on, my spelling is obviously better
698f74c
to
a012e2b
Compare
self.current_flags |= TokenFlags::UNCLOSED_STRING; | ||
|
||
self.push_error(LexicalError::new( | ||
LexicalErrorType::StringError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error was just wrong. a = "string<EOF
reported an unexpected string error rather than unclosed string literal
Summary
This PR improves our lexer to preserve
STRING
tokens instead of converting them toUnknown
if a string literal misses its closing quotes.Instead of converting to
UNKNOWN
, it sets a flag on the string literal that allows upstream tools to check if it's an unclosed string literal.The benefit of preserving string literals is that it gives us much better error recovery because the parser now recognizes those literals.
That means, ty will correctly infer the literal type for
a = "unclosed
to beLiteral["unclosed"]
.Unfortunately, preserving the kind for unclosed string literals regressed the f-string's and t-string's recovery mechanism. So, I went ahead and improved that too.
There are a few improvements:
f"unclosed
) instead of parsing this asf""
}
. E.g., the parser now matches the quotes forf"{ab"
instead of assuming that the closing quotes start a new stringr
format specifiers if the}
is missing:f"{ab:r"
now parses ther
as the raw conversion flag rather thanr"
the start of a raw string literalFixes #19751
Fixes #20849
Review
You probably want to skip the first commit :) It updates all snapshots to now include the
unclosed: <UNCLOSED>
flag.Test Plan
Reviewed and updated the snapshot tests. I also reviewed all usages of
TokenKind::String
to find cases where the missing closing quote could now cause issues.This change should have no impact on AST-based lint rules or the formatter because they both only run when there are no parse errors.