Skip to content

Fix anonymous function edge cases #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 17, 2025

Conversation

MilesCranmer
Copy link
Contributor

This fixes a few classes of anonymous functions that could not be parsed before. @cstjean do you think you could review this? I really need this patch to address some major bugs for BorrowChecker.jl and DispatchDoctor.jl, which both depend on MacroTools.jl for function parsing.

This Fixes #209 Fixes #185

More notes

Here are some functions that could not be parsed with splitdef before this change, but now work:

  • (a::Int; b=2) -> 1
  • (args...) -> 1
  • (arg; kws...) -> 1

All of these functions hit the error

ERROR: AssertionError: Not a function definition: begin

but now they can parse fine:

julia> splitdef(quote; (a::Int; b=2) -> 1; end)
Dict{Symbol, Any} with 4 entries:
  :args        => Any[:(a::Int)]
  :body        => quote
  :kwargs      => Any[:(b = 2)]
  :whereparams => ()

julia> splitdef(quote; (args...) -> 1; end)
Dict{Symbol, Any} with 4 entries:
  :args        => Any[:(args...)]
  :body        => quote
  :kwargs      => Any[]
  :whereparams => ()

julia> splitdef(quote; (arg; kws...) -> 1; end)
Dict{Symbol, Any} with 4 entries:
  :args        => Any[:arg]
  :body        => quote
  :kwargs      => Any[:(kws...)]
  :whereparams => ()

I've verified 100% test coverage of the code diff. I've also verified all the existing tests pass.

PR Checklist

  • Tests are added
  • Documentation, if applicable

Copy link
Collaborator

@cstjean cstjean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, thanks!

@cstjean
Copy link
Collaborator

cstjean commented Apr 14, 2025

I'm at work, so sorry that I cannot investigate it in depth, but

julia> ff = eval( longdef(:((a::Int; b=2) -> a + b)))
#26 (generic function with 1 method)

julia> ff(2, b=11)
13

julia> longdef(:((a::Int; b=2) -> a + b))
:(function a::Int, #= REPL[32]:1 =#, b = 2      # weird
      #= REPL[32]:1 =#
      a + b
  end)

While longdef1 returns a weird expression (at least, that prints weirdly), it is nevertheless handled correctly as a function by eval. I'd be interested to know if the weird expression is an acceptable Julia Expr (then we should file an isse that it's not printed correctly), or if it just passes accidentally. If it is correct, then the correct fix would be to change

  @assert(@capture(longdef1(fdef),
                   function (fcall_ | fcall_) body_ end),
          "Not a function definition: $fdef")

instead of longdef1. It's a wonky @capture statement anyway.

If you're in a rush, I'll be happy to oblige and merge as is, though. Would you prefer that? This PR is a clear improvement.

@MilesCranmer
Copy link
Contributor Author

I think given that these are currently not parsed correctly at all, it’s probably better to go with it to fix the bugs, and then polish things later. What do you think?

@cstjean
Copy link
Collaborator

cstjean commented Apr 14, 2025

I'm a bit torn; if the current longdef is correct, then the extra code in this PR is unnecessary and it can be fixed more simply in splitdef.

But if longdef is incorrect, then we should definitely merge this. Hopefully we get an answer on discourse.

I'll try to merge either this PR or some other fix by tomorrow.

@MilesCranmer
Copy link
Contributor Author

I found another issue in longdef1. Just fixed it. This issue is also present in master. I'm not sure why, but it seems to transform keywords into regular arguments (I think?). There should be a :kw expression, but it makes it a :(=) expression:

On master:

julia> dump(longdef(quote; (a::Int; b=2) -> a; end))
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol block
      args: Array{Any}((3,))
        1: Expr
          head: Symbol ::
          args: Array{Any}((2,))
            1: Symbol a
            2: Symbol Int
        2: LineNumberNode
          line: Int64 1
          file: Symbol REPL[2]
        3: Expr
          head: Symbol =
          args: Array{Any}((2,))
            1: Symbol b
            2: Int64 2
    2: Expr
      head: Symbol block
      args: Array{Any}((2,))
        1: LineNumberNode
          line: Int64 1
          file: Symbol REPL[2]
        2: Symbol a

On this PR (now):

julia> dump(longdef(quote; (a::Int; b=2) -> a; end))
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol tuple
      args: Array{Any}((2,))
        1: Expr
          head: Symbol parameters
          args: Array{Any}((1,))
            1: Expr
              head: Symbol kw
              args: Array{Any}((2,))
                1: Symbol b
                2: Int64 2
        2: Expr
          head: Symbol ::
          args: Array{Any}((2,))
            1: Symbol a
            2: Symbol Int
    2: Expr
      head: Symbol block
      args: Array{Any}((2,))
        1: LineNumberNode
          line: Int64 1
          file: Symbol REPL[1]
        2: Symbol a

We can compare this to the expected result:

julia> :(function (a::Int; b=2); a; end) |> dump
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol tuple
      args: Array{Any}((2,))
        1: Expr
          head: Symbol parameters
          args: Array{Any}((1,))
            1: Expr
              head: Symbol kw
              args: Array{Any}((2,))
                1: Symbol b
                2: Int64 2
        2: Expr
          head: Symbol ::
          args: Array{Any}((2,))
            1: Symbol a
            2: Symbol Int
    2: Expr
      head: Symbol block
      args: Array{Any}((3,))
        1: LineNumberNode
          line: Int64 1
          file: Symbol REPL[3]
        2: LineNumberNode
          line: Int64 1
          file: Symbol REPL[3]
        3: Symbol a

so we see it's now closer to the true AST.

In all honesty I have no idea why the master branch output evaluates... If you copy-paste the longdef result on master into the REPL, Julia complains:

julia> longdef(quote; (a::Int; b=2) -> a; end)
:(function a::Int, #= REPL[1]:1 =#, b = 2
      #= REPL[1]:1 =#
      a
  end)

julia> function a::Int, #= REPL[1]:1 =#, b = 2
             #= REPL[1]:1 =#
             a
         end
ERROR: ParseError:
# Error @ REPL[2]:1:10
function a::Int, #= REPL[1]:1 =#, b = 2
#        ╙ ── Invalid signature in function definition
Stacktrace:
 [1] top-level scope
   @ none:1

So I really don't know why it evaluates at all.

Even if it evaluates somehow, I think it's not safe. It doesn't even print correctly. So the fact it works now might not imply it continues to work in the future, because the printing clearly isn't set up for it.

@cstjean
Copy link
Collaborator

cstjean commented Apr 17, 2025

I would tend to agree with Sukera's post, that it's a printing problem, but the expression is fundamentally correct Julia.

MacroTools is a fundamental package, and the expression returned from longdef is typically further processed, so I am worried that this PR might break some macro code downstream. I didn't find the time to push it yesterday, but I really want to see if there isn't a simpler splitdef fix that passes all the tests in this PR. Sorry for the delay!

@MilesCranmer
Copy link
Contributor Author

MilesCranmer commented Apr 17, 2025

That is a fair point. But I would push back and make the point that: if Julia itself doesn't correctly print the output of longdef for this subtle parsing edgecase (which results in bugs #209 and #185), then there is an extremely low probability that someone's code is correctly handling this.

Worst case, it does, it gets reported, and then just revert the change and make it a breaking release. But I think it's better to worry about known and reported bugs (#209 #185) than theoretical low-probability bugs due to someone possibly making use of an incorrect output that even Julia doesn't handle.

Note that this :block format generated by MacroTools seems to only be done for one keyword. Once it is two keywords, that issue goes away:

julia> longdef(:((a::Int; b=2) -> a+b+c)) |> dump
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol block
      args: Array{Any}((3,))
        1: Expr
          head: Symbol ::
          args: Array{Any}((2,))
            1: Symbol a
            2: Symbol Int
        2: LineNumberNode
          line: Int64 1
          file: Symbol REPL[14]
        3: Expr
          head: Symbol =
          args: Array{Any}((2,))
            1: Symbol b
            2: Int64 2
    2: Expr
      head: Symbol block
      args: Array{Any}((2,))
        1: LineNumberNode
          line: Int64 1
          file: Symbol REPL[14]
        2: Expr
          head: Symbol call
          args: Array{Any}((4,))
            1: Symbol +
            2: Symbol a
            3: Symbol b
            4: Symbol c

julia> longdef(:((a::Int; b=2, c=3) -> a+b+c)) |> dump
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol tuple
      args: Array{Any}((2,))
        1: Expr
          head: Symbol parameters
          args: Array{Any}((2,))
            1: Expr
              head: Symbol kw
              args: Array{Any}((2,))
                1: Symbol b
                2: Int64 2
            2: Expr
              head: Symbol kw
              args: Array{Any}((2,))
                1: Symbol c
                2: Int64 3
        2: Expr
          head: Symbol ::
          args: Array{Any}((2,))
            1: Symbol a
            2: Symbol Int
    2: Expr
      head: Symbol block
      args: Array{Any}((2,))
        1: LineNumberNode
          line: Int64 1
          file: Symbol REPL[13]
        2: Expr
          head: Symbol call
          args: Array{Any}((4,))
            1: Symbol +
            2: Symbol a
            3: Symbol b
            4: Symbol c

So only if someone is handling the single arg case for longdef would they have such an issue. But since there has been no such report of this weird edgecase, I'm assuming nobody does have it.

This PR normalizes the format to the :parameters block. Here's the new output for a single kwarg:

julia> longdef(:((a::Int; b=2) -> a+b+c)) |> dump
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol tuple
      args: Array{Any}((2,))
        1: Expr
          head: Symbol parameters
          args: Array{Any}((1,))
            1: Expr
              head: Symbol kw
              args: Array{Any}((2,))
                1: Symbol b
                2: Int64 2
        2: Expr
          head: Symbol ::
          args: Array{Any}((2,))
            1: Symbol a
            2: Symbol Int
    2: Expr
      head: Symbol block
      args: Array{Any}((2,))
        1: LineNumberNode
          line: Int64 1
          file: Symbol REPL[1]
        2: Expr
          head: Symbol call
          args: Array{Any}((4,))
            1: Symbol +
            2: Symbol a
            3: Symbol b
            4: Symbol c

This is the form that actually matches what Julia outputs:

julia> longdef(:((a::Int; b=2) -> a+b+c))
:(function (a::Int,; b = 2)
      #= REPL[1]:1 =#
      a + b + c
  end)

julia> :(function (a::Int; b=2)
           a+b+c
       end)
:(function (a::Int,; b = 2)
      #= REPL[2]:1 =#
      #= REPL[2]:2 =#
      a + b + c
  end)

@MilesCranmer
Copy link
Contributor Author

MilesCranmer commented Apr 17, 2025

Thinking more about this, I'm assuming it's actually more often that people will assume the second :parameters form. There could be a bunch of latent bugs out there because someone only wrote their code to handle the :parameters form, and just hasn't yet encountered the rare :block case.

Especially because if you look at the AST for :(function (a::Int; b=2); a+b+c; end), it matches the :parameters form. It just seems super unlikely for someone to encounter this bug, correctly handle it AND the multiple kwarg case, but not even point it out in an issue. I mean, even MacroTools.jl splitdef doesn't handle it! That's why there are #209 and #185 in the first place. (Not to mention Julia not knowing how to print it.) I think if internal functions don't handle an edgecase, then users definitely aren't handling the edgecase.

@cstjean
Copy link
Collaborator

cstjean commented Apr 17, 2025

Note that this :block format generated by MacroTools seems to only be done for one keyword. Once it is two keywords, that issue goes away:

Thinking more about this, I'm assuming it's actually more often that people will assume the second :parameters form. There could be a bunch of latent bugs out there because someone only wrote their code to handle the :parameters form, and just hasn't yet encountered the rare :block case.

Those are good arguments, I'm sold. If longdef's use case is "I don't want to explicitly deal with the short-form AST", then it makes sense to standardize the longdef output as much as possible. Thank you for the PR!

@cstjean cstjean merged commit fa9fc14 into FluxML:master Apr 17, 2025
10 checks passed
@MilesCranmer MilesCranmer deleted the fix-anonymous-edgecases branch April 17, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect capture of anonymous function with typed arguments Splitdef errors on anonymous function with a single varargs argument
2 participants