Tree rewriting #905
Replies: 3 comments 3 replies
-
One possible way out of this rewrite conunudrum is to only allow one Assuming we apply rewrites from the inside out, and top-down (i.e. in order
It becomes clearer (to me at least) that the expected output of the preceding example applied to
At the very least, the requirement that (#rewrite!) directives for the same capture should be grouped together seems like a sensible thing to do. This would require us to check alternations to make sure that there is no overlap, which I am not sure of the feasibility of. |
Beta Was this translation helpful? Give feedback.
-
Random thoughts:
|
Beta Was this translation helpful? Give feedback.
-
Another random thought (this is definitely the discussion for them 😅) XSLT is pretty mature. For tree-to-tree transduction, rather than writing a rewrite engine for Tree-sitter CSTs, we could just implement a Tree-sitter CST to XML convertor -- which should be pretty easy1 -- then perform the rewrites with XSLT, then serialise the resulting XML back into code, ready for Topiary to format. Footnotes
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This topic has been on my mind for some time and has recently come up from external collaborators. Rather than posting this as a feature request issue, I'll post it as a discussion...for reasons that will (I hope) become clear.
The problem
Sometimes it's useful to be able to modify a node. We currently do this (sparingly) with insertions -- using
@{append,prepend}_delimiter
-- and deletions. However, this suffers from at least the following problems:Thoughts
Tree-sitter queries are a bit like "higher-dimensional regular expressions" (i.e., acting on trees, rather than strings), so maybe we can push that analogy on the familiar
s/<pattern>/<replacement>/
idiom. Since the queries give us pattern matching for free, maybe something like this would work:On the surface, this might seem acceptable, but there are deeper problems that prevent this from working as one might expect. We're in the realm of tree tranducers (specifically, what's proposed above is a tree-to-string transducer) and these aren't as well-behaved as their lower-dimensional, regular expression counterparts.
For example:
@append_space
) could not appear in a@rewrite
query as it wouldn't make sense (with the possible exception of@do_nothing
). This has consequences on query parsing and matching, which now need to keep track of two states.While the above are certainly problems, they're (probably) not insurmountable. However, there's also a genuine deal breaker: the evaluation order problem, mentioned above with inserts-and-deletes, becomes pathological. For example, consider a rewrite rule that targets tuples of the form
(A,B)
, whereA
andB
could themselves be tuples (i.e., nesting is allowed), or otherwise. Something like:Imagine this rule appears three times in a query file -- as there's no reason they can't coexist -- with different values for
REWRITE_RULE
:"(\2,\1)"
"(\1, \1)"
"(\1,(\2,\1))"
In this case, what should happen to the tuple
(1,(2,3))
?((2,2),1)
(1,1)
(1,((2,2),1))
((2,(3,2)),1)
((3,2),1)
(1,(3,2),1)
This is just a simple example! Broadly speaking, the output from a tree transducer should not exist in the same "space" as its input, to prevent this recursive/ambiguity problem.2 However, that's not the case with the Topiary formatting engine; this class of transformation is very different from what Topiary currently does.
However, there is potentially scope for implementing a deterministic top-down tree tranducer (DTOP) engine within Topiary to enable rewrites. This would have to be separate from formatting (e.g., exposed through
topiary rewrite
) and would necessitate a different set of queries, completely orthogonal to the formatting queries. This engine would have to look at the Tree-sitter tree holistically, as a tree-to-tree DTOP, rather than the piecemeal approach that the formatting engine takes. That could be to Tree-sitter what XSLT is to XML.If this isn't jumping the gun, a combined pipeline -- which first rewrites (with DTOP + rewriting queries), then formats (i.e., as now, with formatting queries) and, when checking for idempotency, formats again -- may be viable end goal...
(Special thanks to @nbacquey for his technical input on this 🙏)
Footnotes
Speaking for myself, note that the "complexity limit" is more on the query author than the Topiary engine itself. It becomes intractable to keep in mind all the possible interactions as more simple rewrites are added. ↩
Another solution would be to only allow one rewrite rule, which could only be applied at most once...but that would be so limiting as to be useless! ↩
Beta Was this translation helpful? Give feedback.
All reactions