You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/dev/syntax.md
+21-21Lines changed: 21 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -64,15 +64,15 @@ struct Token {
64
64
}
65
65
```
66
66
67
-
All the difference bettwen the above sketch and the real implementation are strictly due to optimizations.
67
+
All the difference between the above sketch and the real implementation are strictly due to optimizations.
68
68
69
69
Points of note:
70
70
* The tree is untyped. Each node has a "type tag", `SyntaxKind`.
71
71
* Interior and leaf nodes are distinguished on the type level.
72
72
* Trivia and non-trivia tokens are not distinguished on the type level.
73
73
* Each token carries its full text.
74
74
* The original text can be recovered by concatenating the texts of all tokens in order.
75
-
* Accessing a child of particular type (for example, parameter list of a function) generarly involves linerary traversing the children, looking for a specific `kind`.
75
+
* Accessing a child of particular type (for example, parameter list of a function) generally involves linerary traversing the children, looking for a specific `kind`.
76
76
* Modifying the tree is roughly `O(depth)`.
77
77
We don't make special efforts to guarantree that the depth is not liner, but, in practice, syntax trees are branchy and shallow.
78
78
* If mandatory (grammar wise) node is missing from the input, it's just missing from the tree.
@@ -123,7 +123,7 @@ To more compactly store the children, we box *both* interior nodes and tokens, a
123
123
`Either<Arc<Node>, Arc<Token>>` as a single pointer with a tag in the last bit.
124
124
125
125
To avoid allocating EVERY SINGLE TOKEN on the heap, syntax trees use interning.
126
-
Because the tree is fully imutable, it's valid to structuraly share subtrees.
126
+
Because the tree is fully immutable, it's valid to structurally share subtrees.
127
127
For example, in `1 + 1`, there will be a *single* token for `1` with ref count 2; the same goes for the `` whitespace token.
128
128
Interior nodes are shared as well (for example in `(1 + 1) * (1 + 1)`).
129
129
@@ -134,8 +134,8 @@ Currently, the interner is created per-file, but it will be easy to use a per-th
134
134
135
135
We use a `TextSize`, a newtyped `u32`, to store the length of the text.
136
136
137
-
We currently use `SmolStr`, an small object optimized string to store text.
138
-
This was mostly relevant *before* we implmented tree interning, to avoid allocating common keywords and identifiers. We should switch to storing text data alongside the interned tokens.
137
+
We currently use `SmolStr`, a small object optimized string to store text.
138
+
This was mostly relevant *before* we implemented tree interning, to avoid allocating common keywords and identifiers. We should switch to storing text data alongside the interned tokens.
139
139
140
140
#### Alternative designs
141
141
@@ -162,12 +162,12 @@ Explicit trivia nodes, like in `rowan`, are used by IntelliJ.
162
162
163
163
##### Accessing Children
164
164
165
-
As noted before, accesing a specific child in the node requires a linear traversal of the children (though we can skip tokens, beacuse the tag is encoded in the pointer itself).
165
+
As noted before, accessing a specific child in the node requires a linear traversal of the children (though we can skip tokens, because the tag is encoded in the pointer itself).
166
166
It is possible to recover O(1) access with another representation.
167
167
We explicitly store optional and missing (required by the grammar, but not present) nodes.
168
168
That is, we use `Option<Node>` for children.
169
169
We also remove trivia tokens from the tree.
170
-
This way, each child kind genrerally occupies a fixed position in a parent, and we can use index access to fetch it.
170
+
This way, each child kind generally occupies a fixed position in a parent, and we can use index access to fetch it.
171
171
The cost is that we now need to allocate space for all not-present optional nodes.
172
172
So, `fn foo() {}` will have slots for visibility, unsafeness, attributes, abi and return type.
173
173
@@ -193,7 +193,7 @@ Modeling this with immutable trees is possible, but annoying.
193
193
### Syntax Nodes
194
194
195
195
A function green tree is not super-convenient to use.
196
-
The biggest problem is acessing parents (there are no parent pointers!).
196
+
The biggest problem is accessing parents (there are no parent pointers!).
197
197
But there are also "identify" issues.
198
198
Let's say you want to write a code which builds a list of expressions in a file: `fn collect_exrepssions(file: GreenNode) -> HashSet<GreenNode>`.
199
199
For the input like
@@ -207,7 +207,7 @@ fn main() {
207
207
}
208
208
```
209
209
210
-
both copies of the `x + 2` expression are representing by equal (and, with interning in mind, actualy the same) green nodes.
210
+
both copies of the `x + 2` expression are representing by equal (and, with interning in mind, actually the same) green nodes.
211
211
Green trees just can't differentiate between the two.
212
212
213
213
`SyntaxNode` adds parent pointers and identify semantics to green nodes.
@@ -285,9 +285,9 @@ They also point to the parent (and, consequently, to the root) with an owning `R
285
285
In other words, one needs *one* arc bump when initiating a traversal.
286
286
287
287
To get rid of allocations, `rowan` takes advantage of `SyntaxNode: !Sync` and uses a thread-local free list of `SyntaxNode`s.
288
-
In a typical traversal, you only directly hold a few `SyntaxNode`s at a time (and their ancesstors indirectly), so a free list proportional to the depth of the tree removes all allocations in a typical case.
288
+
In a typical traversal, you only directly hold a few `SyntaxNode`s at a time (and their ancestors indirectly), so a free list proportional to the depth of the tree removes all allocations in a typical case.
289
289
290
-
So, while traversal is not exactly incrementing a pointer, it's still prety cheep: tls + rc bump!
290
+
So, while traversal is not exactly incrementing a pointer, it's still pretty cheap: TLS + rc bump!
291
291
292
292
Traversal also yields (cheap) owned nodes, which improves ergonomics quite a bit.
293
293
@@ -308,15 +308,15 @@ struct SyntaxData {
308
308
}
309
309
```
310
310
311
-
This allows using true pointer equality for comparision of identities of `SyntaxNodes`.
312
-
rust-analyzer used to have this design as well, but since we've switch to cursors.
313
-
The main problem with memoizing the red nodes is that it more than doubles the memory requirenments for fully realized syntax trees.
311
+
This allows using true pointer equality for comparison of identities of `SyntaxNodes`.
312
+
rust-analyzer used to have this design as well, but we've since switched to cursors.
313
+
The main problem with memoizing the red nodes is that it more than doubles the memory requirements for fully realized syntax trees.
314
314
In contrast, cursors generally retain only a path to the root.
315
315
C# combats increased memory usage by using weak references.
316
316
317
317
### AST
318
318
319
-
`GreenTree`s are untyped and homogeneous, because it makes accomodating error nodes, arbitrary whitespace and comments natural, and because it makes possible to write generic tree traversals.
319
+
`GreenTree`s are untyped and homogeneous, because it makes accommodating error nodes, arbitrary whitespace and comments natural, and because it makes possible to write generic tree traversals.
320
320
However, when working with a specific node, like a function definition, one would want a strongly typed API.
321
321
322
322
This is what is provided by the AST layer. AST nodes are transparent wrappers over untyped syntax nodes:
@@ -397,7 +397,7 @@ impl HasVisbility for FnDef {
397
397
Points of note:
398
398
399
399
* Like `SyntaxNode`s, AST nodes are cheap to clone pointer-sized owned values.
400
-
* All "fields" are optional, to accomodate incomplete and/or erroneous source code.
400
+
* All "fields" are optional, to accommodate incomplete and/or erroneous source code.
401
401
* It's always possible to go from an ast node to an untyped `SyntaxNode`.
402
402
* It's possible to go in the opposite direction with a checked cast.
403
403
*`enum`s allow modeling of arbitrary intersecting subsets of AST types.
@@ -437,13 +437,13 @@ impl GreenNodeBuilder {
437
437
}
438
438
```
439
439
440
-
The parser, ultimatelly, needs to invoke the `GreenNodeBuilder`.
440
+
The parser, ultimately, needs to invoke the `GreenNodeBuilder`.
441
441
There are two principal sources of inputs for the parser:
442
442
* source text, which contains trivia tokens (whitespace and comments)
443
443
* token trees from macros, which lack trivia
444
444
445
-
Additionaly, input tokens do not correspond 1-to-1 with output tokens.
446
-
For example, two consequtive`>` tokens might be glued, by the parser, into a single `>>`.
445
+
Additionally, input tokens do not correspond 1-to-1 with output tokens.
446
+
For example, two consecutive`>` tokens might be glued, by the parser, into a single `>>`.
447
447
448
448
For these reasons, the parser crate defines a callback interfaces for both input tokens and output trees.
449
449
The explicit glue layer then bridges various gaps.
@@ -491,7 +491,7 @@ Syntax errors are not stored directly in the tree.
491
491
The primary motivation for this is that syntax tree is not necessary produced by the parser, it may also be assembled manually from pieces (which happens all the time in refactorings).
492
492
Instead, parser reports errors to an error sink, which stores them in a `Vec`.
493
493
If possible, errors are not reported during parsing and are postponed for a separate validation step.
494
-
For example, parser accepts visibility modifiers on trait methods, but then a separate tree traversal flags all such visibilites as erroneous.
494
+
For example, parser accepts visibility modifiers on trait methods, but then a separate tree traversal flags all such visibilities as erroneous.
495
495
496
496
### Macros
497
497
@@ -501,7 +501,7 @@ Specifically, `TreeSink` constructs the tree in lockstep with draining the origi
501
501
In the process, it records which tokens of the tree correspond to which tokens of the input, by using text ranges to identify syntax tokens.
502
502
The end result is that parsing an expanded code yields a syntax tree and a mapping of text-ranges of the tree to original tokens.
503
503
504
-
To deal with precedence in cases like `$expr * 1`, we use special invisible parenthesis, which are explicitelly handled by the parser
504
+
To deal with precedence in cases like `$expr * 1`, we use special invisible parenthesis, which are explicitly handled by the parser
0 commit comments