Proposal: Introduce some hook to change encoding of parsed result

Hi Mitch! First off, thanks for your work on Spitfire. It's shaping up really well and is already a valuable tool!

Second, to hedge a bit, I don't know whether implementing this proposal would be appropriate *right now* given Spitfire's current state of development, but I wanted to bring it up now regardless.

### Proposal

Introduce some hook, be it a callback, behaviour, or something else, that controls the encoding of the parsed result. The default would be to emit `Macro.t()`, as Spitfire currently does, but the hook would allow parsing Elixir code into a bespoke AST of arbitrary format.

As a concrete example, you might imagine the following:

```elixir
Spitfire.parse!("1 + 2.5", encoder: &custom_encoder/1) # not sure what actual encoder arity would be
#=>
%BinaryCall{
  op: :+, 
  meta: %{...}, 
  left: %Integer{value: 1, meta: %{...},
  right: %Float{value: 2.5, meta: %{...}
}
```

### Context

Elixir's AST is intentionally minimal. One reason is to facilitate authoring macros. For example:

```elixir
foo(x: 1, y: 2)

# parses to:
{:foo, [], [[{:x, 1}, {:y, 2}]]}

# as opposed to:
{:foo, [],
 [[{:{}, [], [:x, 1]},
   {:{}, [], [:y, 2]}]]}
```

This makes it trivial to use keyword lists as options in macros, but also means that code processing an AST has to handle two forms of tuple. This is only one example of where the default AST can be cumbersome, but there are many situations where complex pattern matching, guards, or even metadata inspection are required to precisely differentiate syntax.

Sourceror, for instance, has a [long standing issue](https://github.com/doorgan/sourceror/issues/24) for an enriched AST.

### Additional Considerations

- The perhaps obvious alternative is to transform Elixir AST into whatever format you want after the fact. This has two downsides that I can think of:
  1. It is slower to parse, walk, and transform than it would be to parse and output the desired result in one shot.
  2. There is additional metadata/context during parsing that is not included in the Elixir AST but that could be valuable. As a concrete example: when parsing `Foo.Bar.\nBaz` (note the newline), it's not possible to determine which line `Bar` occurs on without inspecting the source, but with token data, it would be.
- It could be valuable to allow this or another hook to maintain and return an accumulator as well. This might be used to collect lint violations (in additional to parse errors). Based on [this comment](https://github.com/elixir-tools/spitfire/commit/26e68d4b184caf32ccee840ef0e862c911ff51ef#diff-c107b1cf64f0ffbe39616bf730655abcb09b0c9db4810c151a1e344ca3c8eaf3R442), it seems like you're already planning to return an accumulated value in addition to the parse result.
- I expect, if implemented, parsing in the default case would be measurably slower due to the overhead of the additional call whenever a node is being constructed. I'm not sure what an acceptable amount of performance loss is, but I acknowledge that there is a line somewhere. (My gut says something like 1.1-1.2x would be acceptable, while 2x would almost certainly not be.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proposal: Introduce some hook to change encoding of parsed result #18

Proposal

Context

Additional Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Proposal: Introduce some hook to change encoding of parsed result #18

Description

Proposal

Context

Additional Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions