Skip to content

Commit a3ca119

Browse files
committed
Add error reporting when encountering unexpected (left over) code points at
the end of an otherwise-successful parse, when doing non-prefix parsing.
1 parent 0715311 commit a3ca119

File tree

3 files changed

+85
-42
lines changed

3 files changed

+85
-42
lines changed

doc/tutorial.qbk

Lines changed: 33 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -3399,31 +3399,36 @@ _w_eh_ (see _p_api_). If you do not set one, _default_eh_ will be used.
33993399
[heading How diagnostics are generated]
34003400

34013401
_Parser_ only generates error messages like the ones in this page at failed
3402-
expectation points, like `a > b`, where you have successfully parsed `a`, but
3403-
then cannot successfully parse `b`. This may seem limited to you. It's
3404-
actually the best that we can do.
3402+
expectation points (like `a > b`, where you have successfully parsed `a`, but
3403+
then cannot successfully parse `b`), and at an unexepcted end of input. This
3404+
may seem limited to you. It's actually the best that we can do.
34053405

34063406
In order for error handling to happen other than at expectation points, we
34073407
have to know that there is no further processing that might take place. This
34083408
is true because _Parser_ has `P1 | P2 | ... | Pn` parsers ("`or_parser`s").
34093409
If any one of these parsers `Pi` fails to match, it is not allowed to fail the
34103410
parse _emdash_ the next one (`Pi+1`) might match. If we get to the end of the
34113411
alternatives of the or_parser and `Pn` fails, we still cannot fail the
3412-
top-level parse, because the `or_parser` might be a subparser within a parent
3413-
`or_parser`.
3414-
3415-
Ok, so what might we do? Perhaps we could at least indicate when we ran into
3416-
end-of-input. But we cannot, for exactly the same reason already stated. For
3417-
any parser `P`, reaching end-of-input is a failure for `P`, but not
3418-
necessarily for the whole parse.
3419-
3420-
Perhaps we could record the farthest point ever reached during the parse, and
3421-
report that at the top level, if the top level parser fails. That would be
3422-
little help without knowing which parser was active when we reached that
3423-
point. This would require some sort of repeated memory allocation, since in
3424-
_Parser_ the progress point of the parser is stored exclusively on the stack
3425-
_emdash_ by the time we fail the top-level parse, all those far-reaching stack
3426-
frames are long gone. Not the best.
3412+
top-level parse, because this `or_parser` might be a subparser within a parent
3413+
`or_parser`. The only exception to this is when: we have finished the
3414+
top-level parse; the top-level parse is *not* a prefix parse; and there is
3415+
still a part of the input range that is left over. In that case, there is an
3416+
implicit expectation that the end of the parse and the end of input are the
3417+
same location, and this implicit expectation has just been violated.
3418+
3419+
Note that we cannot fail the top-level parse when we run into end-of-input.
3420+
We cannot for exactly the same reason already stated. For any parser `P`,
3421+
reaching end-of-input is a failure for `P`, but not necessarily for the whole
3422+
parse.
3423+
3424+
Ok, so what other kinds of error reporting might we do? Perhaps we could
3425+
record the farthest point ever reached during the parse, and report that at
3426+
the top level, if the top level parser fails. That would be little help
3427+
without knowing which parser was active when we reached that point. This
3428+
would require some sort of repeated memory allocation, since in _Parser_ the
3429+
progress point of the parser is stored exclusively on the stack _emdash_ by
3430+
the time we fail the top-level parse, all those far-reaching stack frames are
3431+
long gone. Not the best.
34273432

34283433
Worse still, knowing how far you got in the parse and which parser was active
34293434
is not very useful. Consider this.
@@ -3440,15 +3445,16 @@ Was the error in the input putting the `'a'` at the beginning or putting the
34403445
failed, and never mention `c_b`, you are potentially just steering them in the
34413446
wrong direction.
34423447

3443-
All error messages must come from failed expectation points. Consider parsing
3444-
JSON. If you open a list with `'['`, you know that you're parsing a list, and
3445-
if the list is ill-formed, you'll get an error message saying so. If you open
3446-
an object with `'{'`, the same thing is possible _emdash_ when missing the
3447-
matching `'}'`, you can tell the user, "That's not an object", and this is
3448-
useful feedback. The same thing with a partially parsed number, etc. If the
3449-
JSON parser does not build in expectations like matched braces and brackets,
3450-
how can _Parser_ know that a missing `'}'` is really a problem, and that no
3451-
later parser will match the input even without the `'}'`?
3448+
All error messages must come from failed expectation points (or unexpected end
3449+
of input). Consider parsing JSON. If you open a list with `'['`, you know
3450+
that you're parsing a list, and if the list is ill-formed, you'll get an error
3451+
message saying so. If you open an object with `'{'`, the same thing is
3452+
possible _emdash_ when missing the matching `'}'`, you can tell the user,
3453+
"That's not an object", and this is useful feedback. The same thing with a
3454+
partially parsed number, etc. If the JSON parser does not build in
3455+
expectations like matched braces and brackets, how can _Parser_ know that a
3456+
missing `'}'` is really a problem, and that no later parser will match the
3457+
input even without the `'}'`?
34523458

34533459
[important The bottom line is that you should build expectation points into
34543460
your parsers using `operator>` as much as possible.]

include/boost/parser/parser.hpp

Lines changed: 43 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2715,20 +2715,28 @@ namespace boost { namespace parser {
27152715
}
27162716
}
27172717

2718-
template<typename I, typename S, typename T>
2719-
std::optional<T>
2720-
if_full_parse(I & first, S last, std::optional<T> retval)
2721-
{
2722-
if (first != last)
2723-
retval = std::nullopt;
2724-
return retval;
2725-
}
2726-
template<typename I, typename S>
2727-
bool if_full_parse(I & first, S last, bool retval)
2728-
{
2729-
if (first != last)
2730-
retval = false;
2731-
return retval;
2718+
template<typename I, typename S, typename ErrorHandler, typename T>
2719+
T if_full_parse(
2720+
I initial_first,
2721+
I & first,
2722+
S last,
2723+
ErrorHandler const & error_handler,
2724+
T retval)
2725+
{
2726+
if (first != last) {
2727+
if (retval && error_handler(
2728+
initial_first,
2729+
last,
2730+
parse_error<I>(first, "end of input")) ==
2731+
error_handler_result::rethrow) {
2732+
throw;
2733+
}
2734+
if constexpr (std::is_same_v<T, bool>)
2735+
retval = false;
2736+
else
2737+
retval = std::nullopt;
2738+
}
2739+
return std::move(retval);
27322740
}
27332741

27342742
// The notion of comaptibility is that, given a parser with the
@@ -8817,9 +8825,12 @@ namespace boost { namespace parser {
88178825
auto r_ = detail::make_input_subrange(r);
88188826
auto first = r_.begin();
88198827
auto const last = r_.end();
8828+
auto const initial_first = first;
88208829
return reset = detail::if_full_parse(
8830+
initial_first,
88218831
first,
88228832
last,
8833+
parser.error_handler_,
88238834
parser::prefix_parse(first, last, parser, attr, trace_mode));
88248835
}
88258836

@@ -8922,8 +8933,13 @@ namespace boost { namespace parser {
89228933
auto r_ = detail::make_input_subrange(r);
89238934
auto first = r_.begin();
89248935
auto const last = r_.end();
8936+
auto const initial_first = first;
89258937
return detail::if_full_parse(
8926-
first, last, parser::prefix_parse(first, last, parser, trace_mode));
8938+
initial_first,
8939+
first,
8940+
last,
8941+
parser.error_handler_,
8942+
parser::prefix_parse(first, last, parser, trace_mode));
89278943
}
89288944

89298945
/** Parses `[first, last)` using `parser`, skipping all input recognized
@@ -9058,9 +9074,12 @@ namespace boost { namespace parser {
90589074
auto r_ = detail::make_input_subrange(r);
90599075
auto first = r_.begin();
90609076
auto const last = r_.end();
9077+
auto const initial_first = first;
90619078
return reset = detail::if_full_parse(
9079+
initial_first,
90629080
first,
90639081
last,
9082+
parser.error_handler_,
90649083
parser::prefix_parse(
90659084
first, last, parser, skip, attr, trace_mode));
90669085
}
@@ -9169,9 +9188,12 @@ namespace boost { namespace parser {
91699188
auto r_ = detail::make_input_subrange(r);
91709189
auto first = r_.begin();
91719190
auto const last = r_.end();
9191+
auto const initial_first = first;
91729192
return detail::if_full_parse(
9193+
initial_first,
91739194
first,
91749195
last,
9196+
parser.error_handler_,
91759197
parser::prefix_parse(first, last, parser, skip, trace_mode));
91769198
}
91779199

@@ -9287,9 +9309,12 @@ namespace boost { namespace parser {
92879309
auto r_ = detail::make_input_subrange(r);
92889310
auto first = r_.begin();
92899311
auto const last = r_.end();
9312+
auto const initial_first = first;
92909313
return detail::if_full_parse(
9314+
initial_first,
92919315
first,
92929316
last,
9317+
parser.error_handler_,
92939318
parser::callback_prefix_parse(first, last, parser, callbacks));
92949319
}
92959320

@@ -9423,9 +9448,12 @@ namespace boost { namespace parser {
94239448
auto r_ = detail::make_input_subrange(r);
94249449
auto first = r_.begin();
94259450
auto const last = r_.end();
9451+
auto const initial_first = first;
94269452
return detail::if_full_parse(
9453+
initial_first,
94279454
first,
94289455
last,
9456+
parser.error_handler_,
94299457
parser::callback_prefix_parse(
94309458
first, last, parser, skip, callbacks, trace_mode));
94319459
}

test/parser.cpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,15 @@ int main()
292292
}
293293
BOOST_TEST(parse(str, parser_1));
294294
BOOST_TEST(!parse(str, parser_2));
295+
{
296+
BOOST_TEST(!parse(str, char_));
297+
std::ostringstream err, warn;
298+
stream_error_handler eh("", err, warn);
299+
BOOST_TEST(!parse(str, with_error_handler(char_, eh)));
300+
BOOST_TEST(
301+
err.str() ==
302+
"1:1: error: Expected end of input here:\nab\n ^\n");
303+
}
295304
}
296305
{
297306
std::string str = "ab";

0 commit comments

Comments
 (0)