Skip to content

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Oct 13, 2025

Summary

This PR improves our lexer to preserve STRING tokens instead of converting them to Unknown if a string literal misses its closing quotes.
Instead of converting to UNKNOWN, it sets a flag on the string literal that allows upstream tools to check if it's an unclosed string literal.

The benefit of preserving string literals is that it gives us much better error recovery because the parser now recognizes those literals.
That means, ty will correctly infer the literal type for a = "unclosed to be Literal["unclosed"].

Unfortunately, preserving the kind for unclosed string literals regressed the f-string's and t-string's recovery mechanism. So, I went ahead and improved that too.

There are a few improvements:

  • Preserve the F-STRING middle even if it's unclosed (e.g. f"unclosed) instead of parsing this as f""
  • Better recovery for missing }. E.g., the parser now matches the quotes for f"{ab" instead of assuming that the closing quotes start a new string
  • Better recovery for r format specifiers if the } is missing: f"{ab:r" now parses the r as the raw conversion flag rather than r" the start of a raw string literal

Fixes #19751
Fixes #20849

Review

You probably want to skip the first commit :) It updates all snapshots to now include the unclosed: <UNCLOSED> flag.

Test Plan

Reviewed and updated the snapshot tests. I also reviewed all usages of TokenKind::String to find cases where the missing closing quote could now cause issues.

This change should have no impact on AST-based lint rules or the formatter because they both only run when there are no parse errors.

@MichaReiser MichaReiser added the parser Related to the parser label Oct 13, 2025
Comment on lines +7 to 8
# This is also true for
# unterminated f-strings.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this looks silly, but keeping the comment over two lines reduces the snapshot changes.

bitflags! {
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub(crate) struct TokenFlags: u8 {
pub(crate) struct TokenFlags: u16 {
Copy link
Member Author

@MichaReiser MichaReiser Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this increases the size of TokenFlags, it doesn't increase the size of Token. Which is why I didn't bother with any fancy encoding (e.g. it's unclosed if RAW_STRING_UPPERCASE and RAW_STRING_LOWERCASE are set)

Copy link
Contributor

github-actions bot commented Oct 13, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

}

#[test]
fn lex_fstring_unclsoed() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn lex_fstring_unclsoed() {
fn lex_fstring_unclosed() {

Copy link
Member Author

@MichaReiser MichaReiser Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah come on, my spelling is obviously better

@MichaReiser MichaReiser force-pushed the micha/unclosed-string branch from 698f74c to a012e2b Compare October 14, 2025 09:06
self.current_flags |= TokenFlags::UNCLOSED_STRING;

self.push_error(LexicalError::new(
LexicalErrorType::StringError,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error was just wrong. a = "string<EOF reported an unexpected string error rather than unclosed string literal

@MichaReiser MichaReiser requested a review from dylwil3 October 14, 2025 09:31
@MichaReiser MichaReiser marked this pull request as ready for review October 14, 2025 09:31
@MichaReiser MichaReiser changed the title Better error recovery for unclosed strings (including f- and t-strings) Improved error recovery for unclosed strings (including f- and t-strings) Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parser Related to the parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preserve unterminated string tokens Panic f-string: unexpected token TStringMiddle

2 participants