Skip to content

Commit 9103bd5

Browse files
committed
allow parsing any collection, not just Vec
1 parent e93be70 commit 9103bd5

File tree

3 files changed

+46
-45
lines changed

3 files changed

+46
-45
lines changed

book/src/formality_core/parse.md

Lines changed: 31 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@ struct MyEnum {
2323

2424
When parsing an enum there will be multiple possibilities. We will attempt to parse them all. If more than one succeeds, the parser will attempt to resolve the ambiguity by looking for the **longest match**. However, we don't just consider the number of characters, we look for a **reduction prefix**:
2525

26-
* When parsing, we track the list of things we had to parse. If there are two variants at the same precedence level, but one of them had to parse strictly more things than the other and in the same way, we'll prefer the longer one. So for example if one variant parsed a `Ty` and the other parsed a `Ty Ty`, we'd take the `Ty Ty`.
27-
* When considering whether a reduction is "significant", we take casts into account. See `ActiveVariant::mark_as_cast_variant` for a more detailed explanation and set of examples.
26+
- When parsing, we track the list of things we had to parse. If there are two variants at the same precedence level, but one of them had to parse strictly more things than the other and in the same way, we'll prefer the longer one. So for example if one variant parsed a `Ty` and the other parsed a `Ty Ty`, we'd take the `Ty Ty`.
27+
- When considering whether a reduction is "significant", we take casts into account. See `ActiveVariant::mark_as_cast_variant` for a more detailed explanation and set of examples.
2828

2929
### Precedence and left-recursive grammars
3030

@@ -36,42 +36,41 @@ We support left-recursive grammars like this one from the `parse-torture-tests`:
3636

3737
We also support ambiguous grammars. For example, you can code up arithmetic expressions like this:
3838

39-
4039
```rust
4140
{{#include ../../../tests/parser-torture-tests/left_associative.rs:Expr}}
4241
```
4342

4443
When specifying the `#[precedence]` of a variant, the default is left-associativity, which can be written more explicitly as `#[precedence(L, left)]`. If you prefer, you can specify right-associativity (`#[precedence(L, right)]`) or non-associativity `#[precedence(L, none)]`. This affects how things of the same level are parsed:
4544

46-
* `1 + 1 + 1` when left-associative is `(1 + 1) + 1`
47-
* `1 + 1 + 1` when right-associative is `1 + (1 + 1)`
48-
* `1 + 1 + 1` when none-associative is an error.
45+
- `1 + 1 + 1` when left-associative is `(1 + 1) + 1`
46+
- `1 + 1 + 1` when right-associative is `1 + (1 + 1)`
47+
- `1 + 1 + 1` when none-associative is an error.
4948

5049
### Symbols
5150

52-
A grammar consists of a series of *symbols*. Each symbol matches some text in the input string. Symbols come in two varieties:
53-
54-
* Most things are *terminals* or *tokens*: this means they just match themselves:
55-
* For example, the `*` in `#[grammar($v0 * $v1)]` is a terminal, and it means to parse a `*` from the input.
56-
* Delimeters are accepted but must be matched, e.g., `( /* tokens */ )` or `[ /* tokens */ ]`.
57-
* Things beginning with `$` are *nonterminals* -- they parse the contents of a field. The grammar for a field is generally determined from its type.
58-
* If fields have names, then `$field` should name the field.
59-
* For position fields (e.g., the T and U in `Mul(Expr, Expr)`), use `$v0`, `$v1`, etc.
60-
* Exception: `$$` is treated as the terminal `'$'`.
61-
* Nonterminals have various modes:
62-
* `$field` -- just parse the field's type
63-
* `$*field` -- the field must be a `Vec<T>` -- parse any number of `T` instances. Something like `[ $*field ]` would parse `[f1 f2 f3]`, assuming `f1`, `f2`, and `f3` are valid values for `field`.
64-
* `$,field` -- similar to the above, but uses a comma separated list (with optional trailing comma). So `[ $,field ]` will parse something like `[f1, f2, f3]`.
65-
* `$?field` -- will parse `field` and use `Default::default()` value if not present.
66-
* `$<field>` -- parse `<E1, E2, E3>`, where `field: Vec<E>`
67-
* `$<?field>` -- parse `<E1, E2, E3>`, where `field: Vec<E>`, but accept empty string as empty vector
68-
* `$(field)` -- parse `(E1, E2, E3)`, where `field: Vec<E>`
69-
* `$(?field)` -- parse `(E1, E2, E3)`, where `field: Vec<E>`, but accept empty string as empty vector
70-
* `$[field]` -- parse `[E1, E2, E3]`, where `field: Vec<E>`
71-
* `$[?field]` -- parse `[E1, E2, E3]`, where `field: Vec<E>`, but accept empty string as empty vector
72-
* `${field}` -- parse `{E1, E2, E3}`, where `field: Vec<E>`
73-
* `${?field}` -- parse `{E1, E2, E3}`, where `field: Vec<E>`, but accept empty string as empty vector
74-
* `$:guard <nonterminal>` -- parses `<nonterminal>` but only if the keyword `guard` is present. For example, `$:where $,where_clauses` would parse `where WhereClause1, WhereClause2, WhereClause3` but would also accept nothing (in which case, you would get an empty vector).
51+
A grammar consists of a series of _symbols_. Each symbol matches some text in the input string. Symbols come in two varieties:
52+
53+
- Most things are _terminals_ or _tokens_: this means they just match themselves:
54+
- For example, the `*` in `#[grammar($v0 * $v1)]` is a terminal, and it means to parse a `*` from the input.
55+
- Delimeters are accepted but must be matched, e.g., `( /* tokens */ )` or `[ /* tokens */ ]`.
56+
- Things beginning with `$` are _nonterminals_ -- they parse the contents of a field. The grammar for a field is generally determined from its type.
57+
- If fields have names, then `$field` should name the field.
58+
- For position fields (e.g., the T and U in `Mul(Expr, Expr)`), use `$v0`, `$v1`, etc.
59+
- Exception: `$$` is treated as the terminal `'$'`.
60+
- Nonterminals have various modes:
61+
- `$field` -- just parse the field's type
62+
- `$*field` -- the field must be a collection of `T` (e.g., `Vec<T>`, `Set<T>`) -- parse any number of `T` instances. Something like `[ $*field ]` would parse `[f1 f2 f3]`, assuming `f1`, `f2`, and `f3` are valid values for `field`.
63+
- `$,field` -- similar to the above, but uses a comma separated list (with optional trailing comma). So `[ $,field ]` will parse something like `[f1, f2, f3]`.
64+
- `$?field` -- will parse `field` and use `Default::default()` value if not present.
65+
- `$<field>` -- parse `<E1, E2, E3>`, where `field` is a collection of `E`
66+
- `$<?field>` -- parse `<E1, E2, E3>`, where `field` is a collection of `E`, but accept empty string as empty vector
67+
- `$(field)` -- parse `(E1, E2, E3)`, where `field` is a collection of `E`
68+
- `$(?field)` -- parse `(E1, E2, E3)`, where `field` is a collection of `E`, but accept empty string as empty vector
69+
- `$[field]` -- parse `[E1, E2, E3]`, where `field` is a collection of `E`
70+
- `$[?field]` -- parse `[E1, E2, E3]`, where `field` is a collection of `E`, but accept empty string as empty vector
71+
- `${field}` -- parse `{E1, E2, E3}`, where `field` is a collection of `E`
72+
- `${?field}` -- parse `{E1, E2, E3}`, where `field` is a collection of `E`, but accept empty string as empty vector
73+
- `$:guard <nonterminal>` -- parses `<nonterminal>` but only if the keyword `guard` is present. For example, `$:where $,where_clauses` would parse `where WhereClause1, WhereClause2, WhereClause3` but would also accept nothing (in which case, you would get an empty vector).
7574

7675
### Greediness
7776

@@ -81,8 +80,8 @@ Parsing is generally greedy. So `$*x` and `$,x`, for example, consume as many en
8180

8281
If no grammar is supplied, the default grammar is determined as follows:
8382

84-
* If a `#[cast]` or `#[variable]` annotation is present, then the default grammar is just `$v0`.
85-
* Otherwise, the default grammar is the name of the type (for structs) or variant (for enums), followed by `()`, with the values for the fields in order. So `Mul(Expr, Expr)` would have a default grammar `mul($v0, $v1)`.
83+
- If a `#[cast]` or `#[variable]` annotation is present, then the default grammar is just `$v0`.
84+
- Otherwise, the default grammar is the name of the type (for structs) or variant (for enums), followed by `()`, with the values for the fields in order. So `Mul(Expr, Expr)` would have a default grammar `mul($v0, $v1)`.
8685

8786
### Customizing the parse
8887

@@ -96,7 +95,6 @@ You must then supply an impl of `Parse` yourself. Because `Parse` is a trait ali
9695

9796
In the Rust code, the impl for `RigidTy` looks as follows:
9897

99-
10098
```rust
10199
{{#include ../../../crates/formality-types/src/grammar/ty/parse_impls.rs:RigidTy_impl}}
102-
```
100+
```

crates/formality-core/src/parse.rs

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -233,11 +233,7 @@ where
233233
{
234234
fn parse<'t>(scope: &Scope<L>, text: &'t str) -> ParseResult<'t, Self> {
235235
Parser::single_variant(scope, text, "Set", |p| {
236-
p.expect_char('{')?;
237-
let v = p.comma_nonterminal()?;
238-
p.expect_char('}')?;
239-
let s = v.into_iter().collect();
240-
Ok(s)
236+
p.delimited_nonterminal('{', false, '}')
241237
})
242238
}
243239
}

crates/formality-core/src/parse/parser.rs

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -757,33 +757,39 @@ where
757757
/// Continue parsing instances of `T` while we can.
758758
/// This is a greedy parse.
759759
#[tracing::instrument(level = "Trace", skip(self), ret)]
760-
pub fn many_nonterminal<T>(&mut self) -> Result<Vec<T>, Set<ParseError<'t>>>
760+
pub fn many_nonterminal<C, T>(&mut self) -> Result<C, Set<ParseError<'t>>>
761761
where
762762
T: CoreParse<L>,
763+
C: IntoIterator<Item = T> + FromIterator<T> + Debug,
763764
{
764765
let mut result = vec![];
765766
while let Some(e) = self.opt_nonterminal()? {
766767
result.push(e);
767768
}
768-
Ok(result)
769+
Ok(result.into_iter().collect())
769770
}
770771

771772
#[tracing::instrument(level = "Trace", skip(self), ret)]
772-
pub fn delimited_nonterminal<T>(
773+
pub fn delimited_nonterminal<T, C>(
773774
&mut self,
774775
open: char,
775776
optional: bool,
776777
close: char,
777-
) -> Result<Vec<T>, Set<ParseError<'t>>>
778+
) -> Result<C, Set<ParseError<'t>>>
778779
where
779780
T: CoreParse<L>,
781+
C: IntoIterator<Item = T> + FromIterator<T> + Debug,
780782
{
781783
// Look for the opening delimiter.
782784
// If we don't find it, then this is either an empty vector (if optional) or an error (otherwise).
783785
match self.expect_char(open) {
784786
Ok(()) => {}
785787
Err(errs) => {
786-
return if optional { Ok(vec![]) } else { Err(errs) };
788+
return if optional {
789+
Ok(std::iter::empty().collect())
790+
} else {
791+
Err(errs)
792+
};
787793
}
788794
}
789795

@@ -797,9 +803,10 @@ where
797803

798804
/// Parse multiple instances of `T` separated by commas.
799805
#[track_caller]
800-
pub fn comma_nonterminal<T>(&mut self) -> Result<Vec<T>, Set<ParseError<'t>>>
806+
pub fn comma_nonterminal<C, T>(&mut self) -> Result<C, Set<ParseError<'t>>>
801807
where
802808
T: CoreParse<L>,
809+
C: IntoIterator<Item = T> + FromIterator<T> + Debug,
803810
{
804811
let mut result = vec![];
805812
while let Some(e) = self.opt_nonterminal()? {
@@ -809,7 +816,7 @@ where
809816
break;
810817
}
811818
}
812-
Ok(result)
819+
Ok(result.into_iter().collect())
813820
}
814821

815822
/// Consumes a nonterminal from the input after skipping whitespace.

0 commit comments

Comments
 (0)