Skip to content

Commit 540f2d3

Browse files
committed
langkit/parsers.py: support Cut in Opt subparser
Improve error recovery of incomplete code parsing by allowing Cut parser in Opt ones. TN: S201-022
1 parent de0d147 commit 540f2d3

File tree

8 files changed

+620
-17
lines changed

8 files changed

+620
-17
lines changed

langkit/parsers.py

Lines changed: 68 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -615,18 +615,42 @@ def traverse_nobacktrack(
615615
variable if necessary, indicating which parsers should not backtrack.
616616
"""
617617
if isinstance(self, Cut):
618+
# Do not create a new variable for consecutive Cuts
619+
618620
if nobt:
619621
self.no_backtrack = nobt
620622
else:
621623
self.no_backtrack = VarDef('nobt', T.Bool, reinit=True)
622624

623625
for c in self.children:
624626
nobt = c.traverse_nobacktrack(self.no_backtrack)
625-
# Or parsers are a stop point for Cut
626627

627-
if nobt and not isinstance(self, Or):
628+
# Or and Opt parsers are stop points for Cut:
629+
#
630+
# * For Or(A, B, ...) parsers, the effect of a Cut in A/B/... must
631+
# stop when parsing returns from A/B/..., so we do not want the
632+
# no_backtrack variable to be propagated from A to B, etc. and
633+
# from A/B/... to the Or parser itself.
634+
#
635+
# * For Parser(A, Opt(B), Opt(C), ...) parsers, the effect of a Cut
636+
# in A/... must stop when parsing B/C, so we do not want the
637+
# no_backtrack variable to be propagated from A/... to B/C. On
638+
# the other hand, a Cut in B should be propagated to the Parser
639+
# itself, which includes A/... parsers, but not C (i.e. the
640+
# effect of a Cut in B or C must not affect C or B, respectively,
641+
# but only their parent parser Parser).
642+
643+
if nobt and not isinstance(self, Or) and not isinstance(c, Opt):
628644
self.no_backtrack = nobt
629645

646+
# If c is an Opt parser that contains a Cut, the no_backtrack value
647+
# of c will be propagated to self: create a no_backtrack variable
648+
# in self to hold the propagated value if no Cut has been defined
649+
# at this point in self yet.
650+
651+
if nobt and not self.no_backtrack and isinstance(c, Opt):
652+
self.no_backtrack = VarDef('nobt', T.Bool, reinit=True)
653+
630654
return self.no_backtrack
631655

632656
def create_vars_after(self, start_pos: VarDef) -> None:
@@ -2385,6 +2409,48 @@ class Cut(Parser):
23852409
function Foo is -- This function decl will be parsed correctly
23862410
print("lol")
23872411
end
2412+
2413+
Still in the perspective of better error recovery, a ``Cut`` parser is also
2414+
allowed in an ``Opt`` parser in order to prevent backtracking even when an
2415+
``Opt`` parser fails. Here is an example of how to use the ``Cut`` parser
2416+
in an ``Opt`` one::
2417+
2418+
body=Body(Opt("scope", identifier), "begin", stmts_list, "end")
2419+
2420+
In this case, if we try to parse the input ``"scope begin [stmts] end"``,
2421+
it will fail because of the missing ``identifier`` field, the ``Opt``
2422+
parser will backtrack and the ``scope`` keyword will report an error.
2423+
Nevertheless, it can be improved thanks to a ``Cut``::
2424+
2425+
body=Body(Opt("scope", Cut(), identifier), "begin", stmts_list, "end")
2426+
2427+
Now, the parser will not backtrack and produce an incomplete node, taking
2428+
into account the ``Opt`` part. The error will now concern the
2429+
``identifier`` field being absent instead of complaining about the
2430+
``scope`` keyword. This also means that on the simple input: ``"scope"``,
2431+
the parser won't backtrack and produce an incomplete ``Body`` node.
2432+
2433+
Note that the ``Cut`` parser only applies to the ``Opt`` parser it is
2434+
defined in, therefore, the parser will backtrack on the following input:
2435+
``"begin end"``. Here, the parser will fail because of the missing
2436+
``stmts_list`` field. Several ``Cut`` parsers can be used to improve error
2437+
recovery in that case. Rewriting the rule as::
2438+
2439+
body=Body(Opt("scope", Cut(), identifier),
2440+
"begin", Cut(), stmts_list, "end")
2441+
2442+
will allow the parser to properly parse the incomplete input, reporting the
2443+
missing ``stmts_list`` field. Moreover, if no ``Cut`` is defined in the
2444+
``Opt`` parser::
2445+
2446+
body=Body(Opt("scope", identifier), "begin", Cut(), stmts_list, "end")
2447+
2448+
The ``Cut`` in the ``Body`` parser has no effect in the ``Opt`` part, which
2449+
means that the following input: ``"scope begin end"``, will produce a
2450+
parsing error and won't recover anything from the ``Opt`` parser: the
2451+
``identifier`` being absent, the ``Opt`` parser will fail and backtrack,
2452+
the ``scope`` keyword will be reported as en error, and, the ``begin end``
2453+
will be incompletely parsed (no backtrack because of the ``Cut``).
23882454
"""
23892455

23902456
def discard(self) -> bool:

langkit/templates/parsers/opt_code_ada.mako

Lines changed: 68 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,49 @@ if parser._booleanize:
1212
alt_true, alt_false = base._alternatives
1313
%>
1414

15+
<%def name="no_backtrack_failure()">
16+
## Code to execute for error recovery inside the opt parser: set parser
17+
## position to the last failure position and emit a diagnostic.
18+
if ${parser.no_backtrack} then
19+
${subparser.pos_var} := Parser.Last_Fail.Pos;
20+
21+
Append (Parser.Diagnostics,
22+
Sloc_Range (Parser.TDH.all,
23+
Get_Token (Parser.TDH.all, ${subparser.pos_var})),
24+
To_Text ("Cannot parse <${parser.name}>"));
25+
26+
Add_Last_Fail_Diagnostic (Parser);
27+
end if;
28+
</%def>
29+
30+
<%def name="init_empty_list()">
31+
${subparser.res_var} :=
32+
${parser_type.parser_allocator} (Parser.Mem_Pool);
33+
Initialize
34+
(Self => ${parser.res_var},
35+
Kind => ${parser_type.ada_kind_name},
36+
Unit => Parser.Unit,
37+
Token_Start_Index => ${parser.start_pos} - 1,
38+
Token_End_Index => No_Token_Index);
39+
Initialize_List
40+
(Self => ${subparser.res_var},
41+
Parser => Parser,
42+
Count => 0);
43+
</%def>
44+
45+
<%def name="discard_res_var()">
46+
${subparser.res_var} := ${parser_type.storage_nullexpr};
47+
</%def>
48+
49+
<%def name="reset_pos_var()">
50+
${subparser.pos_var} := ${parser.start_pos};
51+
</%def>
52+
1553
${subparser.generate_code()}
1654

1755
if ${subparser.pos_var} = No_Token_Index then
1856
## The subparser failed to match the input: produce result for the empty
19-
## sequence.
57+
## or incomplete sequence.
2058

2159
% if parser._booleanize:
2260
% if base.is_bool_type:
@@ -31,20 +69,29 @@ if ${subparser.pos_var} = No_Token_Index then
3169
Token_End_Index => No_Token_Index);
3270
% endif
3371
% elif parser_type and parser_type.is_list_type:
34-
${subparser.res_var} :=
35-
${parser_type.parser_allocator} (Parser.Mem_Pool);
36-
Initialize
37-
(Self => ${parser.res_var},
38-
Kind => ${parser_type.ada_kind_name},
39-
Unit => Parser.Unit,
40-
Token_Start_Index => ${parser.start_pos} - 1,
41-
Token_End_Index => No_Token_Index);
42-
Initialize_List
43-
(Self => ${subparser.res_var},
44-
Parser => Parser,
45-
Count => 0);
72+
% if parser.no_backtrack:
73+
${no_backtrack_failure()}
74+
75+
## Init an empty list if the subparser failed
76+
if ${subparser.res_var} = ${parser_type.storage_nullexpr} then
77+
${init_empty_list()}
78+
end if;
79+
% else:
80+
## Backtrack case: discard subparser result (init an empty list)
81+
${init_empty_list()}
82+
% endif
4683
% elif parser_type:
47-
${subparser.res_var} := ${parser_type.storage_nullexpr};
84+
% if parser.no_backtrack:
85+
${no_backtrack_failure()}
86+
87+
## Backtrack case: discard subparser result
88+
if not ${parser.no_backtrack} then
89+
${discard_res_var()}
90+
end if;
91+
% else:
92+
## Backtrack case: discard subparser result
93+
${discard_res_var()}
94+
% endif
4895
% endif
4996

5097
% if parser._is_error:
@@ -56,7 +103,13 @@ if ${subparser.pos_var} = No_Token_Index then
56103
To_Text ("Missing '${subparser.error_repr}'"));
57104
% endif
58105

59-
${subparser.pos_var} := ${parser.start_pos};
106+
% if parser.no_backtrack:
107+
if not ${parser.no_backtrack} then
108+
${reset_pos_var()}
109+
end if;
110+
% else:
111+
${reset_pos_var()}
112+
% endif
60113

61114
% if parser._booleanize:
62115
else

langkit/templates/parsers/row_code_ada.mako

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@ ${parser.pos_var} := ${parser.start_pos};
1111
## Parse the element
1212
${subparser.generate_code()}
1313

14+
## Propagate no_backtrack information. If a subparser sets its no_backtrack
15+
## variable, it should propagate the result to its parent.
16+
% if subparser.no_backtrack and parser.no_backtrack:
17+
${parser.no_backtrack} := ${subparser.no_backtrack};
18+
% endif
19+
1420
% if parser.progress_var:
1521
${parser.progress_var} := ${num};
1622
% endif
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
import lexer_example
2+
3+
@with_lexer(foo_lexer)
4+
grammar foo_grammar {
5+
@main_rule stmt_rule <- list*(or(def | var | dot | comma))
6+
id <- Id(@identifier)
7+
def <- Def(
8+
"def"
9+
/ id ?pick("(" / id ")") ?pick("{" / id "}")
10+
)
11+
var <- Var(
12+
"var" / id ?pick("(" / list+(id, ",") ")")
13+
)
14+
dot <- Dot(
15+
"." id ?pick("(" / id ")") ?pick("{" / id "}")
16+
)
17+
comma <- Comma(?pick("(" / id ")") "," id id)
18+
}
19+
20+
@abstract class FooNode implements Node[FooNode] {
21+
}
22+
23+
class Comma : FooNode {
24+
@parse_field id1: Id
25+
@parse_field id2: Id
26+
@parse_field id3: Id
27+
}
28+
29+
class Def : FooNode {
30+
@parse_field id1: Id
31+
@parse_field id2: Id
32+
@parse_field id3: Id
33+
}
34+
35+
class Dot : FooNode {
36+
@parse_field id1: Id
37+
@parse_field id2: Id
38+
@parse_field id3: Id
39+
}
40+
41+
class Id : FooNode implements TokenNode {
42+
}
43+
44+
class Var : FooNode {
45+
@parse_field id: Id
46+
@parse_field ids: ASTList[FooNode, Id]
47+
}
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
import libfoolang
2+
3+
4+
inputs = [
5+
('complete case 1', "def a"),
6+
('complete case 2', "def a (b)"),
7+
('complete case 3', "def a (b) {c}"),
8+
('complete case 4', "var a"),
9+
('complete case 5', "var a (b)"),
10+
('complete case 6', "var a (b, c, d)"),
11+
('complete case 7', ". a (b)"),
12+
('complete case 8', ". a (b) {c}"),
13+
('complete case 9', ", a b"),
14+
('complete case 10', "(a) , b c"),
15+
# The def and var rules check that incomplete results are produced
16+
# regarding the presence of several cut parsers.
17+
('incomplete case 1', "def"),
18+
('incomplete case 2', "def a (b"),
19+
('incomplete case 3', "def a (b) {c"),
20+
('incomplete case 4', "def a ("),
21+
('incomplete case 5', "def a (b) {"),
22+
('incomplete case 6', "def a ( {"),
23+
('incomplete case 7', "def a (b {c"),
24+
('incomplete case 8', "var"),
25+
('incomplete case 9', "var a ("),
26+
('incomplete case 10', "var a ()"),
27+
('incomplete case 11', "var a (b, c, d"),
28+
# The dot rule checks that an incomplete result is produced if only the
29+
# optional part can set the no_backtracing variable.
30+
('incomplete case 12', ". a (b"),
31+
('incomplete case 13', ". a (b) {"),
32+
('incomplete case 14', ". a ( {"),
33+
# The comma rule is similar to the dot one but the optional part is at the
34+
# beginning of the rule.
35+
('incomplete case 15', ", b"),
36+
('incomplete case 16', "(a) , b"),
37+
('incomplete case 17', "(a , b"),
38+
]
39+
40+
ctx = libfoolang.AnalysisContext()
41+
42+
for name, text in inputs:
43+
print(f"=== {name}: {text} ===")
44+
print()
45+
u = ctx.get_from_buffer("buffer", buffer=text)
46+
47+
for d in u.diagnostics:
48+
print(d)
49+
u.root.dump()
50+
print()

0 commit comments

Comments
 (0)