You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, the `#[term]` macro will generate a `Debug` impl that is guided by the `#[grammar]` attributes on your type (see the [parsing](./parse.md) section for more details). But sometimes you want to generate custom logic. You can include a `#[customize(debug)]` declaration to allow that. Most of the type, when you do this, you will also want to [customize parsing](./parse.md#customizing-the-parse), as the `RigidTy` does:
Copy file name to clipboardExpand all lines: book/src/formality_core/parse.md
+25-6Lines changed: 25 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -19,9 +19,14 @@ struct MyEnum {
19
19
}
20
20
```
21
21
22
-
### Ambiguity,
22
+
### Ambiguity and precedence
23
23
24
-
When parsing an enum there will be multiple possibilities. We will attempt to parse them all. If more than one succeeds, the parse is deemed ambiguous and an error is reported. If zero succeed, we will report back a parsing error, attempting to localize the problem as best we can.
24
+
When parsing an enum there will be multiple possibilities. We will attempt to parse them all. If more than one succeeds, the parser will attempt to resolve the ambiguity. Ambiguity can be resolved in two ways:
25
+
26
+
* Explicit precedence: By default, every variant has precedence 0, but you can override this by annotating variants with `#[precedence(N)]` (where `N` is some integer). This will override the precedence for that variant. Variants with higher precedences are preferred.
27
+
* Reduction prefix: When parsing, we track the list of things we had to parse. If there are two variants at the same precedence level, but one of them had to parse strictly more things than the other and in the same way, we'll prefer the longer one. So for example if one variant parsed a `Ty` and the other parsed a `Ty Ty`, we'd take the `Ty Ty`.
28
+
29
+
Otherwise, the parser will panic and report ambiguity. The parser panics rather than returning an error because ambiguity doesn't mean that there is no way to parse the given text as the nonterminal -- rather that there are multiple ways. Errors mean that the text does not match the grammar for that nonterminal.
25
30
26
31
### Symbols
27
32
@@ -37,19 +42,33 @@ A grammar consists of a series of *symbols*. Each symbol matches some text in th
37
42
* Nonterminals can also accept modes:
38
43
*`$field` -- just parse the field's type
39
44
*`$*field` -- the field must be a `Vec<T>` -- parse any number of `T` instances. Something like `[ $*field ]` would parse `[f1 f2 f3]`, assuming `f1`, `f2`, and `f3` are valid values for `field`.
40
-
* If the next token after `$*field` is a terminal, `$*` uses it as lookahead. The grammar `[ $*field ]`, for example, would stop parsing once a `]` is observed.
41
-
* If the `$*field` appears at the end of the grammar or the next symbol is a nonterminal, then the type of `field` must define a grammar that begins with some terminal or else your parsing will likely be ambiguous or infinite.
42
45
*`$,field` -- similar to the above, but uses a comma separated list (with optional trailing comma). So `[ $,field ]` will parse something like `[f1, f2, f3]`.
43
-
* Lookahead is required for `$,field` (this could be changed).
44
46
*`$?field` -- will parse `field` and use `Default::default()` value if not present.
45
47
46
48
### Greediness
47
49
48
-
Parsing is generally greedy. So `$*x` and `$,x`, for example, consume as many entries as they can, subject to lookahead. In a case like `$*f $*g`, the parser would parse as many `f` values as it could before parsing `g`. If some entry can be parsed as either `f` or `g`, the parser will not explore all possibilities. The caveat here is lookahead: given `$*f #`, for example, parsing will stop when `#` is observed, even if `f` could parse `#`.
50
+
Parsing is generally greedy. So `$*x` and `$,x`, for example, consume as many entries as they can. Typically this works best if `x` begins with some symbol that indicates whether it is present.
49
51
50
52
### Default grammar
51
53
52
54
If no grammar is supplied, the default grammar is determined as follows:
53
55
54
56
* If a `#[cast]` or `#[variable]` annotation is present, then the default grammar is just `$v0`.
55
57
* Otherwise, the default grammar is the name of the type (for structs) or variant (for enums), followed by `()`, with the values for the fields in order. So `Mul(Expr, Expr)` would have a default grammar `mul($v0, $v1)`.
58
+
59
+
### Customizing the parse
60
+
61
+
If you prefer, you can customize the parse by annotating your term with `#[customize(parse)]`. In the Rust case, for example, the parsing of `RigidTy` is customized ([as is the debug impl](./debug.md)):
You must then supply an impl of `Parse` yourself. Because `Parse` is a trait alias, you actually want to implement `CoreParse<L>` for your language type `L`. Inside the code you will want to instantiate a `Parser` and then invoke `parse_variant` for every variant, finally invoking `finish`.
68
+
69
+
In the Rust code, the impl for `RigidTy` looks as follows:
0 commit comments