Skip to content

Commit 9cb7de9

Browse files
committed
Remove support for direct use of null-terminated strings with the parser APIs.
Fixes #175. Fixes #190.
1 parent 274e4e3 commit 9cb7de9

19 files changed

+455
-516
lines changed

doc/tables.qbk

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,11 +42,9 @@ itself be used as a parser; it must be called. In the table below:
4242

4343
[note The definition of `parsable_range_like` is:
4444

45-
[parsable_range_like_concept]
45+
[parsable_range_concept]
4646

47-
It is intended to be a range-like thing; a null-terminated sequence of
48-
characters is considered range-like, given that a pointer `T *` to a
49-
null-terminated string is isomorphic with `subrange<T *, _null_sent_>`.]
47+
]
5048

5149
[note Some of the parsers in this table consume no input. All parsers consume
5250
the input they match unless otherwise stated in the table below.]

doc/tutorial.qbk

Lines changed: 10 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1807,8 +1807,7 @@ common:
18071807
* They each return a value contextually convertible to `bool`.
18081808

18091809
* They each take at least a range to parse and a parser. The "range to parse"
1810-
may be an iterator/sentinel pair or an single range-like object. Note that
1811-
"range-like" includes null-terminated string pointers.
1810+
may be an iterator/sentinel pair or an single range object.
18121811

18131812
* They each require forward iterability of the range to parse.
18141813

@@ -1823,7 +1822,7 @@ common:
18231822
the last location within the input that `p` matched. The *whole* input was
18241823
matched if and only if `first == last` after the call to _p_.
18251824

1826-
* When you call any of the range-like overloads of _p_, for example `_p_np_(r,
1825+
* When you call any of the range overloads of _p_, for example `_p_np_(r,
18271826
p, _ws_)`, _p_ only indicates success if *all* of `r` was matched by `p`.
18281827

18291828
[note `wchar_t` is an accepted value type for the input. Please note that
@@ -1834,7 +1833,7 @@ this is interpreted as UTF-16 on MSVC, and UTF-32 everywhere else.]
18341833
There are eight overloads of _p_ and _pp_ combined, because there are three
18351834
either/or options in how you call them.
18361835

1837-
[heading Iterator/sentinel versus range-like]
1836+
[heading Iterator/sentinel versus range]
18381837

18391838
You can call _pp_ with an iterator and sentinel that delimit a range of
18401839
character values. For example:
@@ -1868,32 +1867,11 @@ allows calls like `_p_np_("str", p)` to work naturally.
18681867
auto result_2 = bp::parse(U"str", p, bp::ws);
18691868

18701869
char const * str_3 = "str";
1871-
auto result_3 = bp::parse(str_3 | boost::parser::as_utf16, p, bp::ws);
1872-
1873-
You can also call _p_ with a pointer to a null-terminated string of character
1874-
values. _p_ considers pointers to null-terminated strings to be ranges,
1875-
since, for any pointer `T *` to a null-terminated string, `T *` is isomorphic
1876-
with `subrange<T *, _null_sent_>`.
1877-
1878-
namespace bp = boost::parser;
1879-
auto const p = /* some parser ... */;
1880-
1881-
char const * str_1 = /* ... */ ;
1882-
auto result_1 = bp::parse(str_1, p, bp::ws);
1883-
char8_t const * str_2 = /* ... */ ;
1884-
auto result_2 = bp::parse(str_2, p, bp::ws);
1885-
char16_t const * str_3 = /* ... */ ;
1886-
auto result_3 = bp::parse(str_3, p, bp::ws);
1887-
char32_t const * str_4 = /* ... */ ;
1888-
auto result_4 = bp::parse(str_4, p, bp::ws);
1889-
1890-
int const array[] = { 's', 't', 'r', 0 };
1891-
int const * array_ptr = array;
1892-
auto result_5 = bp::parse(array_ptr, p, bp::ws);
1870+
auto result_3 = bp::parse(bp::null_term(str_3) | bp::as_utf16, p, bp::ws);
18931871

18941872
Since there is no way to indicate that `p` matches the input, but only a
1895-
prefix of the input was matched, the range-like (non-iterator/sentinel)
1896-
overloads of _p_ indicate failure if the entire input is not matched.
1873+
prefix of the input was matched, the range (non-iterator/sentinel) overloads
1874+
of _p_ indicate failure if the entire input is not matched.
18971875

18981876
[heading With or without an attribute out-parameter]
18991877

@@ -3049,10 +3027,9 @@ code paths, as they are written generically. The only difference is that the
30493027
Unicode code path parses the input as a range of code points, and the
30503028
non-Unicode path does not. In effect, this means that, in the Unicode code
30513029
path, when you call `_p_np_(r, p)` for some input range `r` and some parser
3052-
`p`, the parse happens as if you called `_p_np_(r | boost::parser::as_utf32, p)`
3053-
instead. (Of course, it does not matter if `r` is a null-terminated pointer,
3054-
a proper range, or an iterator/sentinel pair; those all work fine with
3055-
`boost::parser::as_utf32`.)
3030+
`p`, the parse happens as if you called `_p_np_(r | boost::parser::as_utf32,
3031+
p)` instead. (Of course, it does not matter if `r` is a proper range, or an
3032+
iterator/sentinel pair; those both work fine with `boost::parser::as_utf32`.)
30563033

30573034
Matching "characters" within _Parser_'s parsers is assumed to be a code point
30583035
match. In the Unicode path there is a code point from the input that is
@@ -3187,8 +3164,7 @@ the parser.
31873164

31883165
The other adaptors `as_utf8` and `as_utf16` are also provided for
31893166
completeness, if you want to use them. They each can transcode any sequence
3190-
of character types. A null-terminated string is considered a sequence of
3191-
character type.
3167+
of character types.
31923168

31933169
[important The `as_utfN` adaptors are optional, so they don't come with
31943170
`parser.hpp`. To get access to them, `#include

include/boost/parser/detail/stl_interfaces/view_adaptor.hpp

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,30 @@ namespace boost::parser::detail { namespace stl_interfaces {
5858
constexpr bool is_invocable_v =
5959
is_detected_v<invocable_expr, F, Args...>;
6060

61+
// This ensures that captures don't decay from arrays to pointers.
62+
// The decay is fine to do for NTBSs, but not arrays like {'a', 'b'}.
63+
// This is done here since it's too late to see that we were passed an
64+
// array where we need it, much later. Consider a call to replace()
65+
// for instance -- we'd want to know in the replace_impl function that
66+
// we were passed an array, but by then it's too late. We are
67+
// *thoroughly* unlikely to be passed anything but an array of
68+
// characters, so I'm not checking here that the array is not ints or
69+
// whatever before chopping off the null terminator.
70+
template<size_t N, typename CharT>
71+
auto array_to_range(CharT (&arr)[N])
72+
{
73+
auto const first = std::begin(arr);
74+
auto last = std::end(arr);
75+
if (N && !arr[N - 1])
76+
--last;
77+
return BOOST_PARSER_SUBRANGE(first, last);
78+
}
79+
template<typename T>
80+
decltype(auto) array_to_range(T && x)
81+
{
82+
return (T &&)x;
83+
}
84+
6185
template<typename Func, typename... CapturedArgs>
6286
struct bind_back_t
6387
{
@@ -69,7 +93,8 @@ namespace boost::parser::detail { namespace stl_interfaces {
6993

7094
template<typename F, typename... Args>
7195
explicit constexpr bind_back_t(int, F && f, Args &&... args) :
72-
f_((F &&) f), bound_args_((Args &&) args...)
96+
f_((F &&) f),
97+
bound_args_((Args &&) args...)
7398
{
7499
static_assert(sizeof...(Args) == sizeof...(CapturedArgs), "");
75100
}
@@ -125,8 +150,9 @@ namespace boost::parser::detail { namespace stl_interfaces {
125150
template<typename Func, typename... Args>
126151
constexpr auto bind_back(Func && f, Args &&... args)
127152
{
128-
return detail::bind_back_result<Func, Args...>(
129-
0, (Func &&) f, (Args &&) args...);
153+
return detail::bind_back_result<
154+
Func, detail::remove_cvref_t<decltype(detail::array_to_range(std::declval<Args>()))>...>(
155+
0, (Func &&) f, detail::array_to_range((Args &&) args)...);
130156
}
131157

132158
#if BOOST_PARSER_DEFINE_CUSTOM_RANGE_ADAPTOR_CLOSURE || \

include/boost/parser/detail/text/concepts.hpp

Lines changed: 0 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,6 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
5151
utf8_code_unit<T> || utf16_code_unit<T> || utf32_code_unit<T>;
5252

5353

54-
template<typename T, format F>
55-
concept code_unit_pointer =
56-
std::is_pointer_v<T> && code_unit<std::iter_value_t<T>, F>;
57-
5854
template<typename T, format F>
5955
concept code_unit_range = std::ranges::input_range<T> &&
6056
code_unit<std::ranges::range_value_t<T>, F>;
@@ -66,17 +62,13 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
6662
template<typename T>
6763
concept utf8_iter = code_unit_iter<T, format::utf8>;
6864
template<typename T>
69-
concept utf8_pointer = code_unit_pointer<T, format::utf8>;
70-
template<typename T>
7165
concept utf8_range = code_unit_range<T, format::utf8>;
7266
template<typename T>
7367
concept contiguous_utf8_range = contiguous_code_unit_range<T, format::utf8>;
7468

7569
template<typename T>
7670
concept utf16_iter = code_unit_iter<T, format::utf16>;
7771
template<typename T>
78-
concept utf16_pointer = code_unit_pointer<T, format::utf16>;
79-
template<typename T>
8072
concept utf16_range = code_unit_range<T, format::utf16>;
8173
template<typename T>
8274
concept contiguous_utf16_range =
@@ -85,8 +77,6 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
8577
template<typename T>
8678
concept utf32_iter = code_unit_iter<T, format::utf32>;
8779
template<typename T>
88-
concept utf32_pointer = code_unit_pointer<T, format::utf32>;
89-
template<typename T>
9080
concept utf32_range = code_unit_range<T, format::utf32>;
9181
template<typename T>
9282
concept contiguous_utf32_range =
@@ -102,9 +92,6 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
10292
template<typename T>
10393
concept utf_iter = utf8_iter<T> || utf16_iter<T> || utf32_iter<T>;
10494
template<typename T>
105-
concept utf_pointer =
106-
utf8_pointer<T> || utf16_pointer<T> || utf32_pointer<T>;
107-
template<typename T>
10895
concept utf_range = utf8_range<T> || utf16_range<T> || utf32_range<T>;
10996

11097

@@ -182,23 +169,6 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
182169
{ t(msg) } -> std::same_as<char32_t>;
183170
// clang-format on
184171
};
185-
186-
template<typename T>
187-
// clang-format off
188-
concept utf_range_like =
189-
utf_range<std::remove_reference_t<T>> ||
190-
utf_pointer<std::remove_reference_t<T>>;
191-
// clang-format on
192-
193-
template<typename T>
194-
concept utf8_range_like = utf8_code_unit<std::iter_value_t<T>> ||
195-
utf8_pointer<std::remove_reference_t<T>>;
196-
template<typename T>
197-
concept utf16_range_like = utf16_code_unit<std::iter_value_t<T>> ||
198-
utf16_pointer<std::remove_reference_t<T>>;
199-
template<typename T>
200-
concept utf32_range_like = utf32_code_unit<std::iter_value_t<T>> ||
201-
utf32_pointer<std::remove_reference_t<T>>;
202172
//]
203173

204174
// Clang 13 defines __cpp_lib_concepts but not std::indirectly copyable.

include/boost/parser/detail/text/transcode_algorithm.hpp

Lines changed: 12 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -19,34 +19,6 @@
1919

2020
namespace boost::parser::detail { namespace text {
2121

22-
template<typename Range>
23-
struct utf_range_like_iterator
24-
{
25-
using type = decltype(std::declval<Range>().begin());
26-
};
27-
28-
template<typename T>
29-
struct utf_range_like_iterator<T *>
30-
{
31-
using type = T *;
32-
};
33-
34-
template<std::size_t N, typename T>
35-
struct utf_range_like_iterator<T[N]>
36-
{
37-
using type = T *;
38-
};
39-
40-
template<std::size_t N, typename T>
41-
struct utf_range_like_iterator<T (&)[N]>
42-
{
43-
using type = T *;
44-
};
45-
46-
template<typename Range>
47-
using utf_range_like_iterator_t =
48-
typename utf_range_like_iterator<Range>::type;
49-
5022
/** An alias for `in_out_result` returned by algorithms that perform a
5123
transcoding copy. */
5224
template<typename Iter, typename OutIter>
@@ -652,7 +624,7 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
652624
}
653625

654626
template<typename Range, typename OutIter>
655-
transcode_result<utf_range_like_iterator_t<Range>, OutIter>
627+
transcode_result<detail::iterator_t<Range>, OutIter>
656628
transcode_to_utf8(Range && r, OutIter out)
657629
{
658630
return dtl::transcode_to_8_dispatch<false, Range, OutIter>::call(
@@ -670,7 +642,7 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
670642
}
671643

672644
template<typename Range, typename OutIter>
673-
transcode_result<utf_range_like_iterator_t<Range>, OutIter>
645+
transcode_result<detail::iterator_t<Range>, OutIter>
674646
transcode_to_utf16(Range && r, OutIter out)
675647
{
676648
return dtl::transcode_to_16_dispatch<false, Range, OutIter>::call(
@@ -688,7 +660,7 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
688660
}
689661

690662
template<typename Range, typename OutIter>
691-
transcode_result<utf_range_like_iterator_t<Range>, OutIter>
663+
transcode_result<detail::iterator_t<Range>, OutIter>
692664
transcode_to_utf32(Range && r, OutIter out)
693665
{
694666
return dtl::transcode_to_32_dispatch<false, Range, OutIter>::call(
@@ -719,16 +691,12 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
719691
}
720692

721693
template<typename R, std::output_iterator<uint32_t> O>
722-
requires(utf16_range_like<R> || utf32_range_like<R>)
694+
requires(utf16_range<R> || utf32_range<R>)
723695
transcode_result<dtl::uc_result_iterator<R>, O> transcode_to_utf8(
724696
R && r, O out)
725697
{
726-
if constexpr (std::is_pointer_v<std::remove_reference_t<R>>) {
727-
return text::transcode_to_utf8(r, null_sentinel, out);
728-
} else {
729-
return text::transcode_to_utf8(
730-
std::ranges::begin(r), std::ranges::end(r), out);
731-
}
698+
return text::transcode_to_utf8(
699+
std::ranges::begin(r), std::ranges::end(r), out);
732700
}
733701

734702

@@ -750,16 +718,12 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
750718
}
751719

752720
template<typename R, std::output_iterator<uint32_t> O>
753-
requires(utf8_range_like<R> || utf32_range_like<R>)
721+
requires(utf8_range<R> || utf32_range<R>)
754722
transcode_result<dtl::uc_result_iterator<R>, O> transcode_to_utf16(
755723
R && r, O out)
756724
{
757-
if constexpr (std::is_pointer_v<std::remove_reference_t<R>>) {
758-
return text::transcode_to_utf16(r, null_sentinel, out);
759-
} else {
760-
return text::transcode_to_utf16(
761-
std::ranges::begin(r), std::ranges::end(r), out);
762-
}
725+
return text::transcode_to_utf16(
726+
std::ranges::begin(r), std::ranges::end(r), out);
763727
}
764728

765729

@@ -781,16 +745,12 @@ namespace boost::parser::detail { namespace text { BOOST_PARSER_DETAIL_TEXT_NAME
781745
}
782746

783747
template<typename R, std::output_iterator<uint32_t> O>
784-
requires(utf8_range_like<R> || utf16_range_like<R>)
748+
requires(utf8_range<R> || utf16_range<R>)
785749
transcode_result<dtl::uc_result_iterator<R>, O> transcode_to_utf32(
786750
R && r, O out)
787751
{
788-
if constexpr (std::is_pointer_v<std::remove_reference_t<R>>) {
789-
return text::transcode_to_utf32(r, null_sentinel, out);
790-
} else {
791-
return text::transcode_to_utf32(
792-
std::ranges::begin(r), std::ranges::end(r), out);
793-
}
752+
return text::transcode_to_utf32(
753+
std::ranges::begin(r), std::ranges::end(r), out);
794754
}
795755

796756
}}}

include/boost/parser/detail/text/transcode_view.hpp

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -389,8 +389,7 @@ namespace boost::parser::detail { namespace text {
389389
template<class R>
390390
requires (std::ranges::viewable_range<R> &&
391391
std::ranges::input_range<R> &&
392-
std::convertible_to<std::ranges::range_reference_t<R>, format_to_type_t<Format>>) ||
393-
utf_pointer<std::remove_cvref_t<R>>
392+
std::convertible_to<std::ranges::range_reference_t<R>, format_to_type_t<Format>>)
394393
#else
395394
template<class R>
396395
#endif
@@ -403,9 +402,6 @@ namespace boost::parser::detail { namespace text {
403402
#else
404403
return 42; // Never gonna happen.
405404
#endif
406-
} else if constexpr (std::is_pointer_v<T>) {
407-
return View(
408-
BOOST_PARSER_DETAIL_TEXT_SUBRANGE(r, null_sentinel));
409405
} else {
410406
return View(std::forward<R>(r));
411407
}
@@ -725,8 +721,7 @@ namespace boost::parser::detail { namespace text {
725721
template<class R>
726722
requires is_utf_view<std::remove_cvref_t<R>> ||
727723
(std::ranges::viewable_range<R> &&
728-
can_utf_view<unpacked_range<R>, View>) ||
729-
utf_pointer<std::remove_cvref_t<R>>
724+
can_utf_view<unpacked_range<R>, View>)
730725
#else
731726
template<typename R>
732727
#endif
@@ -743,9 +738,6 @@ namespace boost::parser::detail { namespace text {
743738
return View(std::forward<R>(r).base());
744739
} else if constexpr (detail::is_charn_view<T>) {
745740
return View(std::forward<R>(r));
746-
} else if constexpr (std::is_pointer_v<T>) {
747-
return View(
748-
BOOST_PARSER_DETAIL_TEXT_SUBRANGE(r, null_sentinel));
749741
} else {
750742
return View(detail::unpack_range(std::forward<R>(r)));
751743
}

0 commit comments

Comments
 (0)