Skip to content

Proposal: Introduce some hook to change encoding of parsed result #18

@zachallaun

Description

@zachallaun

Hi Mitch! First off, thanks for your work on Spitfire. It's shaping up really well and is already a valuable tool!

Second, to hedge a bit, I don't know whether implementing this proposal would be appropriate right now given Spitfire's current state of development, but I wanted to bring it up now regardless.

Proposal

Introduce some hook, be it a callback, behaviour, or something else, that controls the encoding of the parsed result. The default would be to emit Macro.t(), as Spitfire currently does, but the hook would allow parsing Elixir code into a bespoke AST of arbitrary format.

As a concrete example, you might imagine the following:

Spitfire.parse!("1 + 2.5", encoder: &custom_encoder/1) # not sure what actual encoder arity would be
#=>
%BinaryCall{
  op: :+, 
  meta: %{...}, 
  left: %Integer{value: 1, meta: %{...},
  right: %Float{value: 2.5, meta: %{...}
}

Context

Elixir's AST is intentionally minimal. One reason is to facilitate authoring macros. For example:

foo(x: 1, y: 2)

# parses to:
{:foo, [], [[{:x, 1}, {:y, 2}]]}

# as opposed to:
{:foo, [],
 [[{:{}, [], [:x, 1]},
   {:{}, [], [:y, 2]}]]}

This makes it trivial to use keyword lists as options in macros, but also means that code processing an AST has to handle two forms of tuple. This is only one example of where the default AST can be cumbersome, but there are many situations where complex pattern matching, guards, or even metadata inspection are required to precisely differentiate syntax.

Sourceror, for instance, has a long standing issue for an enriched AST.

Additional Considerations

  • The perhaps obvious alternative is to transform Elixir AST into whatever format you want after the fact. This has two downsides that I can think of:
    1. It is slower to parse, walk, and transform than it would be to parse and output the desired result in one shot.
    2. There is additional metadata/context during parsing that is not included in the Elixir AST but that could be valuable. As a concrete example: when parsing Foo.Bar.\nBaz (note the newline), it's not possible to determine which line Bar occurs on without inspecting the source, but with token data, it would be.
  • It could be valuable to allow this or another hook to maintain and return an accumulator as well. This might be used to collect lint violations (in additional to parse errors). Based on this comment, it seems like you're already planning to return an accumulated value in addition to the parse result.
  • I expect, if implemented, parsing in the default case would be measurably slower due to the overhead of the additional call whenever a node is being constructed. I'm not sure what an acceptable amount of performance loss is, but I acknowledge that there is a line somewhere. (My gut says something like 1.1-1.2x would be acceptable, while 2x would almost certainly not be.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions