Skip to content

Commit 78bc141

Browse files
committed
Add doc example of unexpected combining sequence parsers.
Example is based on #215.
1 parent b253d9c commit 78bc141

File tree

2 files changed

+73
-0
lines changed

2 files changed

+73
-0
lines changed

doc/parser.qbk

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@
110110

111111

112112
[def _std_str_ `std::string`]
113+
[def _std_strs_ `std::string`s]
113114
[def _std_vec_char_ `std::vector<char>`]
114115
[def _std_vec_char32_ `std::vector<char32_t>`]
115116

@@ -242,6 +243,7 @@
242243
[def _more_about_rules_ [link boost_parser.tutorial.more_about_rules More About Rules]]
243244
[def _unicode_ [link boost_parser.tutorial.unicode_support Unicode Support]]
244245
[def _concepts_ [link boost_parser.concepts Concepts]]
246+
[def _seq_parser_example_ [link boost_parser.tutorial.attribute_generation.a_sequence_parser_attribute_example A sequence parser attribute example]]
245247
[def _ex_json_ [link boost_parser.extended_examples.parsing_json Parsing JSON]]
246248
[def _ex_cb_json_ [link boost_parser.extended_examples.parsing_json_with_callbacks Parsing JSON With Callbacks]]
247249
[def _rationale_ [link boost_parser.rationale Rationale]]

doc/tutorial.qbk

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1616,6 +1616,71 @@ attribute becomes `T`.
16161616
[container_concept]
16171617
]
16181618

1619+
[heading A sequence parser attribute example]
1620+
1621+
Note that the application of `OP` is done in the style of a left-fold, and
1622+
is therefore greedy. This can lead to some non-obvious results. For example,
1623+
consider this program. Thanks to Duncan Paterson for this very nice example!
1624+
1625+
#include <boost/parser/parser.hpp>
1626+
#include <print>
1627+
1628+
namespace bp = boost::parser;
1629+
int main() {
1630+
const auto id_set_action = [](auto &ctx) {
1631+
const auto& [left, right] = _attr(ctx);
1632+
std::println("{} = {}", left, right);
1633+
};
1634+
1635+
const auto id_parser = bp::char_('a', 'z') > *bp::char_('a', 'z');
1636+
1637+
const auto id_set = (id_parser >> '=' >> id_parser)[id_set_action];
1638+
bp::parse("left=right", id_set);
1639+
return 0;
1640+
}
1641+
1642+
Perhaps surprisingly, this program prints `leftr = ight`! Why is this? This
1643+
happens because `id_parser` seems to impose structure, but does not. `id_set`
1644+
is exactly equivalent to this (comments added to clarify which parts are which
1645+
below).
1646+
1647+
const auto id_set = (
1648+
/*A*/ bp::char_('a', 'z') > /*B*/ *bp::char_('a', 'z') >>
1649+
/*C*/ '=' >>
1650+
/*D*/ bp::char_('a', 'z') > /*E*/ *bp::char_('a', 'z')
1651+
)[id_set_action];
1652+
1653+
As _Parser_ applies `OP` to this sequence parser, the individual steps are:
1654+
`A` and `B` get merged into a single _std_str_; `C` is ignored, since it
1655+
produces no attribute; and `D` gets merged into the _std_str_ formed earlier
1656+
by `A` and `B`; finally, we have `E`. `E` does not combine with `D`, as `D`
1657+
was already consumed. `E` also does not combine with the _std_str_ we formed
1658+
from `A`, `B`, and `D`, since we don't combine adjacent containers. In the
1659+
end, we have a 2-tuple of _std_strs_, in which the first element contains all
1660+
the characters parsed by `A`, `B`, and `D`, and in which the second element
1661+
contains all the characters parsed by `E`.
1662+
1663+
That's clearly not what we wanted here, though. How do we get a top-level
1664+
parser that would print `left = right`? We use a _r_. The parser used inside
1665+
a _r_ can never combine with any parser(s) outside the _r_. Instances of a
1666+
rule are inherently separate from all parsers with which they are used,
1667+
whether those parsers are _rs_ or non-_r_ parsers. So, consider a _r_
1668+
equivalent to the previous `id_parser` above.
1669+
1670+
namespace bp = boost::parser;
1671+
bp::rule<struct id_parser_tag, std::string> id_parser = "identifier";
1672+
auto const id_parser_def = bp::char_('a', 'z') > *bp::char_('a', 'z');
1673+
BOOST_PARSER_DEFINE_RULES(id_parser);
1674+
1675+
Later, we can use it just as we used the previous non-rule version.
1676+
1677+
const auto id_set = (id_parser >> '=' >> id_parser)[id_set_action];
1678+
1679+
This produces the results you might expect, since only the `bp::char_('a',
1680+
'z') > *bp::char_('a', 'z')` parser inside the `id_parser` _r_ is ever
1681+
eligible for combining via `OP`.
1682+
1683+
16191684
[heading Alternative parser attribute rules]
16201685

16211686
The rules for alternative parsers are much simpler. For an alternative parer
@@ -2237,6 +2302,8 @@ common use cases for _rs_. Use a _r_ if you want to:
22372302
* fix the attribute type produced by a parser to something other than the
22382303
default;
22392304

2305+
* control the attributes generated by adjacent sequence parsers;
2306+
22402307
* create a parser that produces useful diagnostic text;
22412308

22422309
* create a recursive rule (more on this below);
@@ -2377,6 +2444,10 @@ action if:
23772444

23782445
The notion of "compatible" is defined in _p_api_.
23792446

2447+
[heading Controlling the attributes generated]
2448+
2449+
See the _seq_parser_example_ in the _attr_gen_ section for details.
2450+
23802451
[heading Creating a parser for better diagnostics]
23812452

23822453
Each _r_ has associated diagnostic text that _Parser_ can use for failures of

0 commit comments

Comments
 (0)