@@ -3399,31 +3399,36 @@ _w_eh_ (see _p_api_). If you do not set one, _default_eh_ will be used.
33993399[heading How diagnostics are generated]
34003400
34013401_Parser_ only generates error messages like the ones in this page at failed
3402- expectation points, like `a > b`, where you have successfully parsed `a`, but
3403- then cannot successfully parse `b`. This may seem limited to you . It's
3404- actually the best that we can do.
3402+ expectation points ( like `a > b`, where you have successfully parsed `a`, but
3403+ then cannot successfully parse `b`), and at an unexepcted end of input . This
3404+ may seem limited to you. It's actually the best that we can do.
34053405
34063406In order for error handling to happen other than at expectation points, we
34073407have to know that there is no further processing that might take place. This
34083408is true because _Parser_ has `P1 | P2 | ... | Pn` parsers ("`or_parser`s").
34093409If any one of these parsers `Pi` fails to match, it is not allowed to fail the
34103410parse _emdash_ the next one (`Pi+1`) might match. If we get to the end of the
34113411alternatives of the or_parser and `Pn` fails, we still cannot fail the
3412- top-level parse, because the `or_parser` might be a subparser within a parent
3413- `or_parser`.
3414-
3415- Ok, so what might we do? Perhaps we could at least indicate when we ran into
3416- end-of-input. But we cannot, for exactly the same reason already stated. For
3417- any parser `P`, reaching end-of-input is a failure for `P`, but not
3418- necessarily for the whole parse.
3419-
3420- Perhaps we could record the farthest point ever reached during the parse, and
3421- report that at the top level, if the top level parser fails. That would be
3422- little help without knowing which parser was active when we reached that
3423- point. This would require some sort of repeated memory allocation, since in
3424- _Parser_ the progress point of the parser is stored exclusively on the stack
3425- _emdash_ by the time we fail the top-level parse, all those far-reaching stack
3426- frames are long gone. Not the best.
3412+ top-level parse, because this `or_parser` might be a subparser within a parent
3413+ `or_parser`. The only exception to this is when: we have finished the
3414+ top-level parse; the top-level parse is *not* a prefix parse; and there is
3415+ still a part of the input range that is left over. In that case, there is an
3416+ implicit expectation that the end of the parse and the end of input are the
3417+ same location, and this implicit expectation has just been violated.
3418+
3419+ Note that we cannot fail the top-level parse when we run into end-of-input.
3420+ We cannot for exactly the same reason already stated. For any parser `P`,
3421+ reaching end-of-input is a failure for `P`, but not necessarily for the whole
3422+ parse.
3423+
3424+ Ok, so what other kinds of error reporting might we do? Perhaps we could
3425+ record the farthest point ever reached during the parse, and report that at
3426+ the top level, if the top level parser fails. That would be little help
3427+ without knowing which parser was active when we reached that point. This
3428+ would require some sort of repeated memory allocation, since in _Parser_ the
3429+ progress point of the parser is stored exclusively on the stack _emdash_ by
3430+ the time we fail the top-level parse, all those far-reaching stack frames are
3431+ long gone. Not the best.
34273432
34283433Worse still, knowing how far you got in the parse and which parser was active
34293434is not very useful. Consider this.
@@ -3440,15 +3445,16 @@ Was the error in the input putting the `'a'` at the beginning or putting the
34403445failed, and never mention `c_b`, you are potentially just steering them in the
34413446wrong direction.
34423447
3443- All error messages must come from failed expectation points. Consider parsing
3444- JSON. If you open a list with `'['`, you know that you're parsing a list, and
3445- if the list is ill-formed, you'll get an error message saying so. If you open
3446- an object with `'{'`, the same thing is possible _emdash_ when missing the
3447- matching `'}'`, you can tell the user, "That's not an object", and this is
3448- useful feedback. The same thing with a partially parsed number, etc. If the
3449- JSON parser does not build in expectations like matched braces and brackets,
3450- how can _Parser_ know that a missing `'}'` is really a problem, and that no
3451- later parser will match the input even without the `'}'`?
3448+ All error messages must come from failed expectation points (or unexpected end
3449+ of input). Consider parsing JSON. If you open a list with `'['`, you know
3450+ that you're parsing a list, and if the list is ill-formed, you'll get an error
3451+ message saying so. If you open an object with `'{'`, the same thing is
3452+ possible _emdash_ when missing the matching `'}'`, you can tell the user,
3453+ "That's not an object", and this is useful feedback. The same thing with a
3454+ partially parsed number, etc. If the JSON parser does not build in
3455+ expectations like matched braces and brackets, how can _Parser_ know that a
3456+ missing `'}'` is really a problem, and that no later parser will match the
3457+ input even without the `'}'`?
34523458
34533459[important The bottom line is that you should build expectation points into
34543460your parsers using `operator>` as much as possible.]
0 commit comments