-
-
Notifications
You must be signed in to change notification settings - Fork 38
Tweak macro name representation #572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The current representation for macro names is a bit peculiar. When the parser encounters `@a`, it treats `@` as notation for the macrocall and then `reset_node!`'s (which itself may be considerd a bit of a code smell) the `a` to a special MacroName token kind that (when translated back to julia Expr) implicitly adds back the `@`. Things get even more preculiar with `@var"a"` where only the token inside the string macro gets reset. One particular consequence of this is JuliaLang/julia#58885, because our translation back to Expr does not check the RAW_STRING_FLAG (whereas the translation for K"Identifier" does). A second issue is that we currently parse `@A.b.c` and `A.b.@c` to the same SyntaxTree (of course the green tree is different). We aren't currently being super precise about the required invariants for syntax trees, but in general it would be desirable for non-trivia notation (like `@`) to be precisely recoverable from the tree, which is not the case here. This is especially annoying because there are syntax cases that are errors for one of these, but not the other (e.g. `@A.b.$` is an error, but `A.B.@$` is allowed). Now, I think the wisdom of some of those syntax choices can be debated, but that is the situation we face. So this PR tries to clean that all up a bit by: - Replacing the terminal K"MacroName" by a non-terminal K"macro_name". With this form, `@A.c` parses as `(macro_name (. A c))` while `A.@c` parses as `(. A (macro_name c))`. - (In particular the `@` notation is now always associated with the macro_name). - Emitting the dots in `@..` and `@...` as direct identifier tokens rather than having to reset them back. - Adjusting everything else accordingly. Partially written by Claude Code, though it had some trouble with the actual code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the right approach.
There's some broken cases still in the logic around is_macrocall_on_entry
, for example, there's two macro_name
nodes in the following:
julia> parsestmt(SyntaxNode, "@a[b][c]", ignore_warnings=true)
SyntaxNode:
[ref]
[macro_name]
[macrocall]
[macro_name]
a :: Identifier
[vect]
b :: Identifier
c :: Identifier
and similarly for curlies:
julia> parsestmt(SyntaxNode, "@a{b}{c}", ignore_warnings=true)
SyntaxNode:
[curly]
[macro_name]
[macrocall]
[macro_name]
a :: Identifier
[braces]
b :: Identifier
c :: Identifier
Very optionally, there's also the opportunity to simplify macro_name_position
down to just tracking the orig_kind
of the last identifier, now that fix_macro_name_kind!()
is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like an upgrade to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The current representation for macro names is a bit peculiar. When the parser encounters
@a
, it treats@
as notation for the macrocall and thenreset_node!
's (which itself may be considerd a bit of a code smell) thea
to a special MacroName token kind that (when translated back to julia Expr) implicitly adds back the@
. Things get even more preculiar with@var"a"
where only the token inside the string macro gets reset.One particular consequence of this is JuliaLang/julia#58885, because our translation back to Expr does not check the RAW_STRING_FLAG (whereas the translation for K"Identifier" does).
A second issue is that we currently parse
@A.b.c
andA.b.@c
to the same SyntaxTree (of course the green tree is different). We aren't currently being super precise about the required invariants for syntax trees, but in general it would be desirable for non-trivia notation (like@
) to be precisely recoverable from the tree, which is not the case here. This is especially annoying because there are syntax cases that are errors for one of these, but not the other (e.g.@A.b.$
is an error, butA.B.@$
is allowed). Now, I think the wisdom of some of those syntax choices can be debated, but that is the situation we face.So this PR tries to clean that all up a bit by:
@A.c
parses as(macro_name (. A c))
whileA.@c
parses as(. A (macro_name c))
.@
notation is now always associated with the macro_name).@..
and@...
as direct identifier tokens rather than having to reset them back.Partially written by Claude Code, though it had some trouble with the actual code changes.