Skip to content

Commit c4fe76c

Browse files
authored
GraphViz visualisation support (#326)
* Basic GraphViz visualisation support * Different shapes for node types * Updated README (also included changes identified by review of the upcoming release announcement blog post: tweag/www#1550 (comment))
1 parent a721772 commit c4fe76c

File tree

5 files changed

+106
-36
lines changed

5 files changed

+106
-36
lines changed

README.md

Lines changed: 49 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@ Topiary has been created with the following goals in mind:
3939
* Use [Tree-sitter] for parsing, to avoid writing yet another grammar
4040
for a formatter.
4141

42+
* Expect idempotency. That is, formatting of already-formatted code
43+
doesn't change anything.
44+
4245
* For bundled formatting styles to meet the following constraints:
4346

4447
* Be compatible with attested formatting styles used for that language
@@ -53,11 +56,8 @@ Topiary has been created with the following goals in mind:
5356
won't force you to make later, cosmetic changes when you modify your
5457
code.
5558

56-
* Be idempotent. That is, formatting of already-formatted code doesn't
57-
change anything.
58-
59-
* Code and formatting styles must be well-tested and robust, so that the
60-
formatter can be used in large projects.
59+
* Be well-tested and robust, so that the formatter can be trusted in
60+
large projects.
6161

6262
* For end users -- i.e., not formatting style authors -- the formatter
6363
should:
@@ -165,7 +165,7 @@ Options:
165165

166166
* `-v`, `--visualise`\
167167
Visualise the syntax tree, rather than format [possible values: `json`
168-
(default)].
168+
(default), `dot`].
169169

170170
* `-s`, `--skip-idempotence`\
171171
Do not check that formatting twice gives the same output.
@@ -836,22 +836,26 @@ suggested way to work:
836836

837837
4. Run `RUST_LOG=debug cargo test`.
838838

839-
5. Provided it works, it should output a lot of log messages. Copy that
839+
Provided it works, it should output a lot of log messages. Copy that
840840
output to a text editor. You are particularly interested in the CST
841841
output that starts with a line like this: `CST node: {Node
842842
compilation_unit (0, 0) - (5942, 0)} - Named: true`.
843843

844-
6. The test run will output all the differences between the actual
844+
:bulb: As an alternative to using the debugging output, the
845+
`--visualise` command line option exists to output the Tree-sitter
846+
syntax tree in a variety of formats.
847+
848+
5. The test run will output all the differences between the actual
845849
output and the expected output, e.g. missing spaces between tokens.
846850
Pick a difference you would like to fix, and find the line number and
847851
column in the input file.
848852

849-
7. Keep in mind that the CST output uses 0-based line and column
853+
:bulb: Keep in mind that the CST output uses 0-based line and column
850854
numbers, so if your editor reports line 40, column 37, you probably
851855
want line 39, column 36.
852856

853-
8. In the CST debug output, find the nodes in this region, such as the
854-
following:
857+
6. In the CST debug or visualisation output, find the nodes in this
858+
region, such as the following:
855859

856860
```
857861
[DEBUG atom_collection] CST node: {Node constructed_type (39, 15) - (39, 42)} - Named: true
@@ -861,35 +865,47 @@ suggested way to work:
861865
[DEBUG atom_collection] CST node: {Node type_constructor (39, 36) - (39, 42)} - Named: true
862866
```
863867

864-
9. This may indicate that you would like spaces after all
868+
7. This may indicate that you would like spaces after all
865869
`type_constructor_path` nodes:
866870

867871
```scheme
868872
(type_constructor_path) @append_space
869873
```
870874

871-
10. Or, more likely, you just want spaces between pairs of them:
875+
Or, more likely, you just want spaces between pairs of them:
876+
877+
```scheme
878+
(
879+
(type_constructor_path) @append_space
880+
.
881+
(type_constructor_path)
882+
)
883+
```
884+
885+
Or maybe you want spaces between all children of `constructed_type`:
886+
887+
```scheme
888+
(constructed_type
889+
(_) @append_space
890+
.
891+
(_)
892+
)
893+
```
872894

873-
```scheme
874-
(
875-
(type_constructor_path) @append_space
876-
.
877-
(type_constructor_path)
878-
)
879-
```
895+
8. Run `cargo test` again, to see if the output is better now, and then
896+
return to step 5.
880897

881-
11. Or maybe you want spaces between all children of `constructed_type`:
898+
### Syntax Tree Visualisation
882899

883-
```scheme
884-
(constructed_type
885-
(_) @append_space
886-
.
887-
(_)
888-
)
889-
```
900+
To support the development of formatting queries, the Tree-sitter syntax
901+
tree for a given input can be produced using the `--visualise` CLI
902+
option.
890903

891-
12. Run `cargo test` again, to see if the output is better now, and then
892-
return to step 6.
904+
This currently supports JSON output, covering the same information as
905+
the debugging output, as well as GraphViz DOT output, which is useful
906+
for generating syntax diagrams. (Note that the text position
907+
serialisation in the visualisation output is 1-based, unlike the
908+
debugging output's 0-based position.)
893909

894910
### Terminal-Based Playground
895911

@@ -921,6 +937,8 @@ of choice open in another.
921937
language.
922938
* [Neovim Treesitter Playground][nvim-treesitter]: A Tree-sitter
923939
playground plugin for Neovim.
940+
* [Difftastic]: A tool that utilises Tree-sitter to perform syntactic
941+
diffing.
924942

925943
### Meta and Multi-Language Formatters
926944

@@ -948,6 +966,7 @@ of choice open in another.
948966
[bash]: https://www.gnu.org/software/bash
949967
[ci-badge]: https://github.com/tweag/topiary/actions/workflows/ci.yml/badge.svg
950968
[contributing]: CONTRIBUTING.md
969+
[difftastic]: https://difftastic.wilfred.me.uk
951970
[format-all]: https://melpa.org/#/format-all
952971
[gofmt-slides]: https://go.dev/talks/2015/gofmt-en.slide#1
953972
[gofmt]: https://pkg.go.dev/cmd/gofmt

src/bin/topiary/visualise.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,19 @@ use clap::ValueEnum;
55

66
#[derive(ValueEnum, Clone, Copy, Debug)]
77
pub enum Visualisation {
8+
// JSON is first as it's the default and
9+
// we want it displayed first in the help
810
Json,
11+
12+
// All other output formats should be listed
13+
// in alphabetical order
14+
Dot,
915
}
1016

1117
impl From<Visualisation> for topiary::Visualisation {
1218
fn from(visualisation: Visualisation) -> Self {
1319
match visualisation {
20+
Visualisation::Dot => Self::GraphViz,
1421
Visualisation::Json => Self::Json,
1522
}
1623
}

src/graphviz.rs

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
/// GraphViz visualisation for our SyntaxTree representation
2+
/// Named syntax nodes are elliptical; anonymous are rectangular
3+
use std::{fmt, io};
4+
5+
use crate::{tree_sitter::SyntaxNode, FormatterResult};
6+
7+
impl fmt::Display for SyntaxNode {
8+
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
9+
let shape = match self.is_named {
10+
true => "ellipse",
11+
false => "box",
12+
};
13+
14+
writeln!(
15+
f,
16+
" {} [label=\"{}\", shape={shape}];",
17+
self.id,
18+
self.kind.escape_default()
19+
)?;
20+
21+
for child in &self.children {
22+
writeln!(f, " {} -- {};", self.id, child.id)?;
23+
write!(f, "{child}")?;
24+
}
25+
26+
Ok(())
27+
}
28+
}
29+
30+
pub fn write(output: &mut dyn io::Write, root: &SyntaxNode) -> FormatterResult<()> {
31+
writeln!(output, "graph {{")?;
32+
write!(output, "{root}")?;
33+
writeln!(output, "}}")?;
34+
35+
Ok(())
36+
}

src/lib.rs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,14 @@ use pretty_assertions::StrComparison;
1818
pub use crate::{
1919
error::{FormatterError, IoError},
2020
language::Language,
21-
tree_sitter::Visualisation,
21+
tree_sitter::{SyntaxNode, Visualisation},
2222
};
2323
use configuration::Configuration;
2424

2525
mod atom_collection;
2626
mod configuration;
2727
mod error;
28+
mod graphviz;
2829
mod language;
2930
mod pretty;
3031
mod tree_sitter;
@@ -174,9 +175,10 @@ pub fn formatter(
174175

175176
Operation::Visualise { output_format } => {
176177
let (tree, _) = tree_sitter::parse(&content, configuration.language)?;
177-
let root: tree_sitter::SyntaxNode = tree.root_node().into();
178+
let root: SyntaxNode = tree.root_node().into();
178179

179180
match output_format {
181+
Visualisation::GraphViz => graphviz::write(output, &root)?,
180182
Visualisation::Json => serde_json::to_writer(output, &root)?,
181183
};
182184
}

src/tree_sitter.rs

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ use crate::{
1313
/// Supported visualisation formats
1414
#[derive(Clone, Copy, Debug)]
1515
pub enum Visualisation {
16+
GraphViz,
1617
Json,
1718
}
1819

@@ -35,15 +36,18 @@ impl From<Point> for Position {
3536
// Simplified syntactic node struct, for the sake of serialisation.
3637
#[derive(Serialize)]
3738
pub struct SyntaxNode {
38-
kind: String,
39-
is_named: bool,
39+
#[serde(skip_serializing)]
40+
pub id: usize,
41+
42+
pub kind: String,
43+
pub is_named: bool,
4044
is_extra: bool,
4145
is_error: bool,
4246
is_missing: bool,
4347
start: Position,
4448
end: Position,
4549

46-
children: Vec<SyntaxNode>,
50+
pub children: Vec<SyntaxNode>,
4751
}
4852

4953
impl From<Node<'_>> for SyntaxNode {
@@ -52,7 +56,7 @@ impl From<Node<'_>> for SyntaxNode {
5256
let children = node.children(&mut walker).map(SyntaxNode::from).collect();
5357

5458
Self {
55-
children,
59+
id: node.id(),
5660

5761
kind: node.kind().into(),
5862
is_named: node.is_named(),
@@ -61,6 +65,8 @@ impl From<Node<'_>> for SyntaxNode {
6165
is_missing: node.is_missing(),
6266
start: node.start_position().into(),
6367
end: node.end_position().into(),
68+
69+
children,
6470
}
6571
}
6672
}

0 commit comments

Comments
 (0)