Skip to content

Commit 2002ced

Browse files
authored
Merge pull request #153 from nikomatsakis/parser-testing
Improve parsing even further
2 parents 96922e3 + ae6c33c commit 2002ced

File tree

95 files changed

+2729
-723
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+2729
-723
lines changed

Cargo.lock

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

book/src/formality_core/parse.md

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,33 @@ struct MyEnum {
1919
}
2020
```
2121

22-
### Ambiguity and precedence
22+
### Ambiguity and greedy parsing
2323

24-
When parsing an enum there will be multiple possibilities. We will attempt to parse them all. If more than one succeeds, the parser will attempt to resolve the ambiguity. Ambiguity can be resolved in two ways:
24+
When parsing an enum there will be multiple possibilities. We will attempt to parse them all. If more than one succeeds, the parser will attempt to resolve the ambiguity by looking for the **longest match**. However, we don't just consider the number of characters, we look for a **reduction prefix**:
2525

26-
* Explicit precedence: By default, every variant has precedence 0, but you can override this by annotating variants with `#[precedence(N)]` (where `N` is some integer). This will override the precedence for that variant. Variants with higher precedences are preferred.
27-
* Reduction prefix: When parsing, we track the list of things we had to parse. If there are two variants at the same precedence level, but one of them had to parse strictly more things than the other and in the same way, we'll prefer the longer one. So for example if one variant parsed a `Ty` and the other parsed a `Ty Ty`, we'd take the `Ty Ty`.
26+
* When parsing, we track the list of things we had to parse. If there are two variants at the same precedence level, but one of them had to parse strictly more things than the other and in the same way, we'll prefer the longer one. So for example if one variant parsed a `Ty` and the other parsed a `Ty Ty`, we'd take the `Ty Ty`.
27+
* When considering whether a reduction is "significant", we take casts into account. See `ActiveVariant::mark_as_cast_variant` for a more detailed explanation and set of examples.
2828

29-
Otherwise, the parser will panic and report ambiguity. The parser panics rather than returning an error because ambiguity doesn't mean that there is no way to parse the given text as the nonterminal -- rather that there are multiple ways. Errors mean that the text does not match the grammar for that nonterminal.
29+
### Precedence and left-recursive grammars
30+
31+
We support left-recursive grammars like this one from the `parse-torture-tests`:
32+
33+
```rust
34+
{{#include ../../../tests/parser-torture-tests/src/path.rs:path}}
35+
```
36+
37+
We also support ambiguous grammars. For example, you can code up arithmetic expressions like this:
38+
39+
40+
```rust
41+
{{#include ../../../tests/parser-torture-tests/src/left_associative.rs:Expr}}
42+
```
43+
44+
When specifying the `#[precedence]` of a variant, the default is left-associativity, which can be written more explicitly as `#[precedence(L, left)]`. If you prefer, you can specify right-associativity (`#[precedence(L, right)]`) or non-associativity `#[precedence(L, none)]`. This affects how things of the same level are parsed:
45+
46+
* `1 + 1 + 1` when left-associative is `(1 + 1) + 1`
47+
* `1 + 1 + 1` when right-associative is `1 + (1 + 1)`
48+
* `1 + 1 + 1` when none-associative is an error.
3049

3150
### Symbols
3251

@@ -39,11 +58,20 @@ A grammar consists of a series of *symbols*. Each symbol matches some text in th
3958
* If fields have names, then `$field` should name the field.
4059
* For position fields (e.g., the T and U in `Mul(Expr, Expr)`), use `$v0`, `$v1`, etc.
4160
* Exception: `$$` is treated as the terminal `'$'`.
42-
* Nonterminals can also accept modes:
61+
* Nonterminals have various modes:
4362
* `$field` -- just parse the field's type
4463
* `$*field` -- the field must be a `Vec<T>` -- parse any number of `T` instances. Something like `[ $*field ]` would parse `[f1 f2 f3]`, assuming `f1`, `f2`, and `f3` are valid values for `field`.
4564
* `$,field` -- similar to the above, but uses a comma separated list (with optional trailing comma). So `[ $,field ]` will parse something like `[f1, f2, f3]`.
4665
* `$?field` -- will parse `field` and use `Default::default()` value if not present.
66+
* `$<field>` -- parse `<E1, E2, E3>`, where `field: Vec<E>`
67+
* `$<?field>` -- parse `<E1, E2, E3>`, where `field: Vec<E>`, but accept empty string as empty vector
68+
* `$(field)` -- parse `(E1, E2, E3)`, where `field: Vec<E>`
69+
* `$(?field)` -- parse `(E1, E2, E3)`, where `field: Vec<E>`, but accept empty string as empty vector
70+
* `$[field]` -- parse `[E1, E2, E3]`, where `field: Vec<E>`
71+
* `$[?field]` -- parse `[E1, E2, E3]`, where `field: Vec<E>`, but accept empty string as empty vector
72+
* `${field}` -- parse `{E1, E2, E3}`, where `field: Vec<E>`
73+
* `${?field}` -- parse `{E1, E2, E3}`, where `field: Vec<E>`, but accept empty string as empty vector
74+
* `$:guard <nonterminal>` -- parses `<nonterminal>` but only if the keyword `guard` is present. For example, `$:where $,where_clauses` would parse `where WhereClause1, WhereClause2, WhereClause3`
4775

4876
### Greediness
4977

crates/formality-core/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ tracing-tree = { version = "0.2" }
2020
formality-macros = { version = "0.1.0", path = "../formality-macros" }
2121
anyhow = "1.0.75"
2222
contracts = "0.6.3"
23+
final_fn = "0.1.0"
2324

2425
[dev-dependencies]
2526
expect-test = "1.4.1"

crates/formality-core/src/binder.rs

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -261,14 +261,16 @@ where
261261
T: std::fmt::Debug,
262262
{
263263
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
264-
write!(f, "<")?;
265-
for (kind, i) in self.kinds.iter().zip(0..) {
266-
if i > 0 {
267-
write!(f, ", ")?;
264+
if !self.kinds.is_empty() {
265+
write!(f, "{}", L::BINDING_OPEN)?;
266+
for (kind, i) in self.kinds.iter().zip(0..) {
267+
if i > 0 {
268+
write!(f, ", ")?;
269+
}
270+
write!(f, "{:?}", kind)?;
268271
}
269-
write!(f, "{:?}", kind)?;
272+
write!(f, "{} ", L::BINDING_CLOSE)?;
270273
}
271-
write!(f, "> ")?;
272274
write!(f, "{:?}", &self.term)?;
273275
Ok(())
274276
}

crates/formality-core/src/language.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
use crate::cast::UpcastFrom;
22
use crate::term::CoreTerm;
33
use crate::variable::{CoreBoundVar, CoreExistentialVar, CoreUniversalVar, CoreVariable};
4+
use crate::DowncastTo;
45
use std::fmt::Debug;
56
use std::hash::Hash;
67

@@ -20,7 +21,8 @@ pub trait Language: 'static + Copy + Ord + Hash + Debug + Default {
2021
+ UpcastFrom<CoreVariable<Self>>
2122
+ UpcastFrom<CoreUniversalVar<Self>>
2223
+ UpcastFrom<CoreExistentialVar<Self>>
23-
+ UpcastFrom<CoreBoundVar<Self>>;
24+
+ UpcastFrom<CoreBoundVar<Self>>
25+
+ DowncastTo<CoreVariable<Self>>;
2426

2527
/// The token (typically `<`) used to open binders.
2628
const BINDING_OPEN: char;

crates/formality-core/src/lib.rs

Lines changed: 32 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ pub mod language;
3232
pub mod parse;
3333
pub mod substitution;
3434
pub mod term;
35+
pub mod util;
3536
pub mod variable;
3637
pub mod visit;
3738

@@ -95,18 +96,27 @@ macro_rules! declare_language {
9596
) => {
9697
$(#[$the_lang_m:meta])*
9798
$the_lang_v mod $the_lang {
98-
use super::*;
99+
use $crate::language::Language;
99100

100101
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug, Default)]
101102
pub struct FormalityLang;
102103

103-
impl $crate::language::Language for FormalityLang {
104-
const NAME: &'static str = $name;
105-
type Kind = $kind;
106-
type Parameter = $param;
107-
const BINDING_OPEN: char = $binding_open;
108-
const BINDING_CLOSE: char = $binding_close;
109-
const KEYWORDS: &'static [&'static str] = &[$($kw),*];
104+
// This module may seem weird -- it permits us to import `super::*`
105+
// so that all the types in `$kind` and `$param` are valid without
106+
// importing `super::*` into the entire module. This not only makes
107+
// things a bit nicer, since those imports are not needed and could
108+
// cause weird behavior, it avoids a cycle when users
109+
// do `pub use $lang::grammar::*`.
110+
mod __hygiene {
111+
use super::super::*;
112+
impl $crate::language::Language for super::FormalityLang {
113+
const NAME: &'static str = $name;
114+
type Kind = $kind;
115+
type Parameter = $param;
116+
const BINDING_OPEN: char = $binding_open;
117+
const BINDING_CLOSE: char = $binding_close;
118+
const KEYWORDS: &'static [&'static str] = &[$($kw),*];
119+
}
110120
}
111121

112122
$crate::trait_alias! {
@@ -125,15 +135,19 @@ macro_rules! declare_language {
125135
pub trait Term = $crate::term::CoreTerm<FormalityLang>
126136
}
127137

128-
pub type Variable = $crate::variable::CoreVariable<FormalityLang>;
129-
pub type ExistentialVar = $crate::variable::CoreExistentialVar<FormalityLang>;
130-
pub type UniversalVar = $crate::variable::CoreUniversalVar<FormalityLang>;
131-
pub type BoundVar = $crate::variable::CoreBoundVar<FormalityLang>;
132-
pub type DebruijnIndex = $crate::variable::DebruijnIndex;
133-
pub type VarIndex = $crate::variable::VarIndex;
134-
pub type Binder<T> = $crate::binder::CoreBinder<FormalityLang, T>;
135-
pub type Substitution = $crate::substitution::CoreSubstitution<FormalityLang>;
136-
pub type VarSubstitution = $crate::substitution::CoreVarSubstitution<FormalityLang>;
138+
/// Grammar items to be included in this language.
139+
pub mod grammar {
140+
use super::FormalityLang;
141+
pub type Variable = $crate::variable::CoreVariable<FormalityLang>;
142+
pub type ExistentialVar = $crate::variable::CoreExistentialVar<FormalityLang>;
143+
pub type UniversalVar = $crate::variable::CoreUniversalVar<FormalityLang>;
144+
pub type BoundVar = $crate::variable::CoreBoundVar<FormalityLang>;
145+
pub type DebruijnIndex = $crate::variable::DebruijnIndex;
146+
pub type VarIndex = $crate::variable::VarIndex;
147+
pub type Binder<T> = $crate::binder::CoreBinder<FormalityLang, T>;
148+
pub type Substitution = $crate::substitution::CoreSubstitution<FormalityLang>;
149+
pub type VarSubstitution = $crate::substitution::CoreVarSubstitution<FormalityLang>;
150+
}
137151

138152
/// Parses `text` as a term with no bindings in scope.
139153
#[track_caller]
@@ -161,7 +175,7 @@ macro_rules! declare_language {
161175
pub fn term_with<T, B>(bindings: impl IntoIterator<Item = B>, text: &str) -> $crate::Fallible<T>
162176
where
163177
T: Parse,
164-
B: $crate::Upcast<(String, $param)>,
178+
B: $crate::Upcast<(String, <FormalityLang as Language>::Parameter)>,
165179
{
166180
$crate::parse::core_term_with::<FormalityLang, T, B>(bindings, text)
167181
}

crates/formality-core/src/parse.rs

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ use std::fmt::Debug;
1414
/// Trait for parsing a [`Term<L>`](`crate::term::Term`) as input.
1515
/// Typically this is auto-generated with the `#[term]` procedural macro,
1616
/// but you can implement it by hand if you want a very customized parse.
17-
pub trait CoreParse<L: Language>: Sized + Debug + Clone + Eq {
17+
pub trait CoreParse<L: Language>: Sized + Debug + Clone + Eq + 'static {
1818
/// Parse a single instance of this type, returning an error if no such
1919
/// instance is present.
2020
///
@@ -24,7 +24,7 @@ pub trait CoreParse<L: Language>: Sized + Debug + Clone + Eq {
2424
}
2525

2626
mod parser;
27-
pub use parser::{skip_whitespace, ActiveVariant, Parser};
27+
pub use parser::{skip_whitespace, ActiveVariant, Parser, Precedence};
2828

2929
/// Parses `text` as a term with the given bindings in scope.
3030
///
@@ -58,7 +58,7 @@ where
5858
}
5959

6060
/// Record from a successful parse.
61-
#[derive(Debug, Clone)]
61+
#[derive(Debug, Clone, PartialEq, Eq)]
6262
pub struct SuccessfulParse<'t, T> {
6363
/// The new point in the input, after we've consumed whatever text we have.
6464
text: &'t str,
@@ -76,21 +76,15 @@ pub struct SuccessfulParse<'t, T> {
7676
/// reduction.
7777
reductions: Vec<&'static str>,
7878

79+
/// The precedence of this parse, which is derived from the value given
80+
/// to `parse_variant`.
81+
precedence: Precedence,
82+
7983
/// The value produced.
8084
value: T,
8185
}
8286

8387
impl<'t, T> SuccessfulParse<'t, T> {
84-
#[track_caller]
85-
pub fn new(text: &'t str, reductions: Vec<&'static str>, value: T) -> Self {
86-
// assert!(!reductions.is_empty());
87-
Self {
88-
text,
89-
reductions,
90-
value,
91-
}
92-
}
93-
9488
/// Extract the value parsed and the remaining text,
9589
/// ignoring the reductions.
9690
pub fn finish(self) -> (T, &'t str) {
@@ -103,6 +97,7 @@ impl<'t, T> SuccessfulParse<'t, T> {
10397
SuccessfulParse {
10498
text: self.text,
10599
reductions: self.reductions,
100+
precedence: self.precedence,
106101
value: op(self.value),
107102
}
108103
}
@@ -117,6 +112,7 @@ where
117112
SuccessfulParse {
118113
text: term.text,
119114
reductions: term.reductions,
115+
precedence: term.precedence,
120116
value: term.value.upcast(),
121117
}
122118
}
@@ -172,7 +168,7 @@ pub type ParseResult<'t, T> = Result<SuccessfulParse<'t, T>, Set<ParseError<'t>>
172168
pub type TokenResult<'t, T> = Result<(T, &'t str), Set<ParseError<'t>>>;
173169

174170
/// Tracks the variables in scope at this point in parsing.
175-
#[derive(Clone, Debug)]
171+
#[derive(Clone, Debug, Default)]
176172
pub struct Scope<L: Language> {
177173
bindings: Vec<(String, CoreParameter<L>)>,
178174
}
@@ -222,10 +218,7 @@ where
222218
{
223219
fn parse<'t>(scope: &Scope<L>, text: &'t str) -> ParseResult<'t, Self> {
224220
Parser::single_variant(scope, text, "Vec", |p| {
225-
p.expect_char('[')?;
226-
let v = p.comma_nonterminal()?;
227-
p.expect_char(']')?;
228-
Ok(v)
221+
p.delimited_nonterminal('[', false, ']')
229222
})
230223
}
231224
}
@@ -277,7 +270,13 @@ where
277270
{
278271
fn parse<'t>(scope: &Scope<L>, text: &'t str) -> ParseResult<'t, Self> {
279272
Parser::single_variant(scope, text, "Binder", |p| {
280-
p.expect_char(L::BINDING_OPEN)?;
273+
match p.expect_char(L::BINDING_OPEN) {
274+
Ok(()) => {}
275+
Err(_) => {
276+
return Ok(CoreBinder::dummy(p.nonterminal()?));
277+
}
278+
}
279+
281280
let bindings: Vec<Binding<L>> = p.comma_nonterminal()?;
282281
p.expect_char(L::BINDING_CLOSE)?;
283282

0 commit comments

Comments
 (0)