Skip to content

Tweak macro name representation #572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 10, 2025
Merged

Tweak macro name representation #572

merged 5 commits into from
Jul 10, 2025

Conversation

Keno
Copy link
Member

@Keno Keno commented Jul 9, 2025

The current representation for macro names is a bit peculiar. When the parser encounters @a, it treats @ as notation for the macrocall and then reset_node!'s (which itself may be considerd a bit of a code smell) the a to a special MacroName token kind that (when translated back to julia Expr) implicitly adds back the @. Things get even more preculiar with @var"a" where only the token inside the string macro gets reset.

One particular consequence of this is JuliaLang/julia#58885, because our translation back to Expr does not check the RAW_STRING_FLAG (whereas the translation for K"Identifier" does).

A second issue is that we currently parse @A.b.c and A.b.@c to the same SyntaxTree (of course the green tree is different). We aren't currently being super precise about the required invariants for syntax trees, but in general it would be desirable for non-trivia notation (like @) to be precisely recoverable from the tree, which is not the case here. This is especially annoying because there are syntax cases that are errors for one of these, but not the other (e.g. @A.b.$ is an error, but A.B.@$ is allowed). Now, I think the wisdom of some of those syntax choices can be debated, but that is the situation we face.

So this PR tries to clean that all up a bit by:

  • Replacing the terminal K"MacroName" by a non-terminal K"macro_name". With this form, @A.c parses as (macro_name (. A c)) while A.@c parses as (. A (macro_name c)).
  • (In particular the @ notation is now always associated with the macro_name).
  • Emitting the dots in @.. and @... as direct identifier tokens rather than having to reset them back.
  • Adjusting everything else accordingly.

Partially written by Claude Code, though it had some trouble with the actual code changes.

The current representation for macro names is a bit peculiar. When
the parser encounters `@a`, it treats `@` as notation for the
macrocall and then `reset_node!`'s (which itself may be considerd
a bit of a code smell) the `a` to a special MacroName
token kind that (when translated back to julia Expr) implicitly
adds back the `@`. Things get even more preculiar with `@var"a"`
where only the token inside the string macro gets reset.

One particular consequence of this is JuliaLang/julia#58885,
because our translation back to Expr does not check the RAW_STRING_FLAG
(whereas the translation for K"Identifier" does).

A second issue is that we currently parse `@A.b.c` and `A.b.@c` to the
same SyntaxTree (of course the green tree is different). We aren't
currently being super precise about the required invariants for syntax
trees, but in general it would be desirable for non-trivia notation
(like `@`) to be precisely recoverable from the tree, which is not
the case here. This is especially annoying because there are syntax
cases that are errors for one of these, but not the other (e.g.
`@A.b.$` is an error, but `A.B.@$` is allowed). Now, I think the wisdom
of some of those syntax choices can be debated, but that is the situation
we face.

So this PR tries to clean that all up a bit by:
- Replacing the terminal K"MacroName" by a non-terminal K"macro_name".
  With this form, `@A.c` parses as `(macro_name (. A c))` while
  `A.@c` parses as `(. A (macro_name c))`.
- (In particular the `@` notation is now always associated with the
   macro_name).
- Emitting the dots in `@..` and `@...` as direct identifier tokens
  rather than having to reset them back.
- Adjusting everything else accordingly.

Partially written by Claude Code, though it had some trouble with
the actual code changes.
Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the right approach.

There's some broken cases still in the logic around is_macrocall_on_entry, for example, there's two macro_name nodes in the following:

julia> parsestmt(SyntaxNode, "@a[b][c]", ignore_warnings=true)
SyntaxNode:
[ref]
  [macro_name]
    [macrocall]
      [macro_name]
        a                                :: Identifier
      [vect]
        b                                :: Identifier
  c                                      :: Identifier

and similarly for curlies:

julia> parsestmt(SyntaxNode, "@a{b}{c}", ignore_warnings=true)
SyntaxNode:
[curly]
  [macro_name]
    [macrocall]
      [macro_name]
        a                                :: Identifier
      [braces]
        b                                :: Identifier
  c                                      :: Identifier

Very optionally, there's also the opportunity to simplify macro_name_position down to just tracking the orig_kind of the last identifier, now that fix_macro_name_kind!() is gone.

Copy link
Member

@mlechu mlechu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an upgrade to me

Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Keno Keno merged commit 63bee39 into main Jul 10, 2025
36 checks passed
@Keno Keno deleted the kf/macroname branch July 10, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants