Skip to content

Commit eba46df

Browse files
committed
Split Searcher and Consumer again.
1 parent 370bd60 commit eba46df

File tree

1 file changed

+91
-53
lines changed

1 file changed

+91
-53
lines changed

text/0000-pattern-3.md

Lines changed: 91 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -253,27 +253,23 @@ to recover the indices in the middle (`5 == 3 + 2` and `7 == 3 + 4`).
253253

254254
### Searcher
255255

256-
A searcher has two required methods `.search()` and `.consume()`,
257-
and an optional method `.trim_start()`.
256+
A searcher only provides a single method: `.search()`. It takes a span as input,
257+
and returns the first sub-range where the given pattern is found.
258258

259259
```rust
260260
pub unsafe trait Searcher<A: Hay + ?Sized> {
261261
fn search(&mut self, span: Span<&A>) -> Option<Range<A::Index>>;
262-
fn consume(&mut self, span: Span<&A>) -> Option<A::Index>;
263-
fn trim_start(&mut self, hay: &A) -> A::Index { ... }
264262
}
265263

266264
pub unsafe trait ReverseSearcher<A: Hay + ?Sized>: Searcher<A> {
267265
fn rsearch(&mut self, span: Span<&A>) -> Option<Range<A::Index>>;
268-
fn rconsume(&mut self, span: Span<&A>) -> Option<A::Index>;
269-
fn trim_end(&mut self, hay: &A) -> A::Index { ... }
270266
}
271267

272268
pub unsafe trait DoubleEndedSearcher<A: Hay + ?Sized>: ReverseSearcher<A> {}
273269
```
274270

275-
`.search()` and `.consume()` are safe because there is no safe ways to construct a `Span<&A>`
276-
with invalid ranges. Implementations of these methods often start with:
271+
The `.search()` function is safe because there is no safe ways to construct a `Span<&A>`
272+
with invalid ranges. Implementations of `.search()` often start with:
277273

278274
```rust
279275
fn search(&mut self, span: Span<&A>) -> Option<Range<A::Index>> {
@@ -284,9 +280,27 @@ with invalid ranges. Implementations of these methods often start with:
284280

285281
The trait is unsafe to implement because it needs to guarantee the returned range is valid.
286282

287-
The `.search()` method will look for the first slice matching the searcher's pattern in the span,
283+
### Consumer
284+
285+
A consumer provides the `.consume()` method to implement `starts_with()` and `trim_start()`. It
286+
takes a span as input, and if the beginning matches the pattern, returns the end index of the match.
287+
288+
```rust
289+
pub unsafe trait Consumer<A: Hay + ?Sized> {
290+
fn consume(&mut self, span: Span<&A>) -> Option<A::Index>;
291+
}
292+
293+
pub unsafe trait ReverseConsumer<A: Hay + ?Sized>: Consumer<A> {
294+
fn rconsume(&mut self, span: Span<&A>) -> Option<A::Index>;
295+
}
296+
297+
pub unsafe trait DoubleEndedConsumer<A: Hay + ?Sized>: ReverseConsumer<A> {}
298+
```
299+
300+
Comparing searcher and consumer, the `.search()` method will look for the first slice
301+
matching the searcher's pattern in the span,
288302
and returns the range where the slice is found (relative to the hay's start index).
289-
The `.consume()` method will is similar, but anchored to the start of the span.
303+
The `.consume()` method is similar, but anchored to the start of the span.
290304

291305
```rust
292306
let span = unsafe { Span::from_parts("CDEFG", 3..8) };
@@ -310,9 +324,10 @@ A pattern is simply a "factory" of a searcher and consumer.
310324
```rust
311325
trait Pattern<H: Haystack>: Sized {
312326
type Searcher: Searcher<H::Target>;
327+
type Consumer: Consumer<H::Target>;
313328

314329
fn into_searcher(self) -> Self::Searcher;
315-
fn into_consumer(self) -> Self::Searcher { self.into_searcher() }
330+
fn into_consumer(self) -> Self::Consumer;
316331
}
317332
```
318333

@@ -322,27 +337,55 @@ mutable state when implementing some more sophisticated string searching algorit
322337

323338
The relation between `Pattern` and `Searcher` is thus like `IntoIterator` and `Iterator`.
324339

325-
There is a required method `.into_searcher()` as well as an optional method `.into_consumer()`.
340+
There are two required methods `.into_searcher()` and `.into_consumer()`.
326341
In some patterns (e.g. substring search), checking if a prefix match will require much less
327-
pre-computation than checking if any substring match. Therefore, if an algorithm can declare that
328-
it will only call `.consume()`, the searcher could use a more efficient structure.
342+
pre-computation than checking if any substring match.
343+
Therefore, a consumer could use a more efficient structure with this specialized purpose.
329344

330345
```rust
331346
impl<H: Haystack<Target = str>> Pattern<H> for &'p str {
332347
type Searcher = SliceSearcher<'p, [u8]>;
348+
type Consumer = NaiveSearcher<'p, [u8]>;
333349
#[inline]
334350
fn into_searcher(self) -> Self::Searcher {
335351
// create a searcher based on Two-Way algorithm.
336-
SliceSearcher::new_searcher(self)
352+
SliceSearcher::new(self)
337353
}
338354
#[inline]
339-
fn into_consumer(self) -> Self::Searcher {
355+
fn into_consumer(self) -> Self::Consumer {
340356
// create a searcher based on naive search (which requires no pre-computation)
341-
SliceSearcher::new_consumer(self)
357+
NaiveSearcher::new(self)
342358
}
343359
}
344360
```
345361

362+
Note that, unlike `IntoIterator`, the standard library is unable to provide a blanket impl:
363+
364+
```rust
365+
impl<H, S> Pattern<H> for S
366+
where
367+
H: Haystack,
368+
S: Searcher<H::Target> + Consumer<H::Target>,
369+
{
370+
type Searcher = Self;
371+
type Consumer = Self;
372+
fn into_searcher(self) -> Self { self }
373+
fn into_consumer(self) -> Self { self }
374+
}
375+
```
376+
377+
This is because there is already an existing Pattern impl:
378+
379+
```rust
380+
impl<'h, F> Pattern<&'h str> for F
381+
where
382+
F: FnMut(char) -> bool,
383+
{ ... }
384+
```
385+
386+
and a type can implement all of `(FnMut(char) -> bool) + Searcher<str> + Consumer<str>`,
387+
causing impl conflict.
388+
346389
### Algorithms
347390

348391
Standard algorithms are provided as *functions* in the `core::pattern::ext` module.
@@ -360,7 +403,7 @@ where
360403
pub fn ends_with<H, P>(haystack: H, pattern: P) -> bool
361404
where
362405
H: Haystack,
363-
P: Pattern<H, Searcher: ReverseSearcher<H::Target>>;
406+
P: Pattern<H, Consumer: ReverseConsumer<H::Target>>;
364407
```
365408

366409
**Trim**
@@ -374,12 +417,12 @@ where
374417
pub fn trim_end<H, P>(haystack: H, pattern: P) -> H
375418
where
376419
H: Haystack,
377-
P: Pattern<H, Searcher: ReverseSearcher<H::Target>>;
420+
P: Pattern<H, Consumer: ReverseConsumer<H::Target>>;
378421

379422
pub fn trim<H, P>(haystack: H, pattern: P) -> H
380423
where
381424
H: Haystack,
382-
P: Pattern<H, Searcher: DoubleEndedSearcher<H::Target>>;
425+
P: Pattern<H, Consumer: DoubleEndedConsumer<H::Target>>;
383426
```
384427

385428
**Matches**
@@ -665,7 +708,7 @@ The main performance improvement comes from `trim()`. In v1.0, `trim()` depends
665708
the `Searcher::next_reject()` method, which requires initializing a searcher and compute
666709
the critical constants for the Two-Way search algorithm. Search algorithms mostly concern about
667710
quickly skip through mismatches, but the purpose of `.next_reject()` is to find mismatches, so a
668-
searcher would be a job mismatch for `trim()`. This justifies the `.into_consumer()` method in v3.0.
711+
searcher would be a job mismatch for `trim()`. This justifies the `Consumer` trait in v3.0.
669712

670713
<details><summary>Summary of benchmark</summary>
671714

@@ -717,7 +760,7 @@ searcher would be a job mismatch for `trim()`. This justifies the `.into_consume
717760

718761
[suffix table]: https://docs.rs/suffix/1.0.0/suffix/struct.SuffixTable.html#method.positions
719762

720-
2. Patterns are still moved when converting to a Searcher.
763+
2. Patterns are still moved when converting to a Searcher or Consumer.
721764
Taking the entire ownership of the pattern might prevent some use cases... ?
722765

723766
* Stabilization of this RFC is blocked by [RFC 1672] \(disjointness based on associated types)
@@ -1087,26 +1130,6 @@ trait Consumer<A: Hay + ?Sized> {
10871130
Both `starts_with()` and `trim()` can be efficiently implemented in terms of `.consume()`,
10881131
though for some patterns a specialized `trim()` can be even faster, so we keep this default method.
10891132

1090-
During the RFC, after we have actually tried the API on third party code, we found that having
1091-
`Searcher` and `Consumer` as two distinct traits seldom have any advantages as most of the time they
1092-
are the same type anyway. Therefore, we *merge* the consumer methods into the `Searcher` trait,
1093-
while still keeping `Pattern::into_consumer()` so we could still choose the less expensive algorithm
1094-
at runtime.
1095-
1096-
```rust
1097-
// v3.0-alpha.8
1098-
trait Pattern<H: Haystack> {
1099-
type Searcher: Searcher<H::Target>;
1100-
fn into_searcher(self) -> Self::Searcher;
1101-
fn into_consumer(self) -> Self::Searcher { self.into_searcher() }
1102-
}
1103-
trait Searcher<A: Hay + ?Sized> {
1104-
fn search(&mut self, hay: Span<&A>) -> Option<Range<A::Index>>;
1105-
fn consume(&mut self, hay: Span<&A>) -> Option<A::Index>;
1106-
fn trim_start(&mut self, hay: &A) -> A::Index { /* default impl */ }
1107-
}
1108-
```
1109-
11101133
## Miscellaneous decisions
11111134

11121135
### `usize` as index instead of pointers
@@ -1183,13 +1206,12 @@ And thus the more general `Borrow` trait offers no advantage over `Deref`.
11831206

11841207
### Searcher makes Hay an input type instead of associated type
11851208

1186-
The `Searcher` trait makes the hay as input type.
1209+
The `Searcher` and `Consumer` traits makes the hay as input type.
11871210
This makes any algorithm relying on a `ReverseSearcher` need to spell out the hay as well.
11881211

11891212
```rust
11901213
trait Searcher<A: Hay + ?Sized> {
11911214
fn search(&mut self, span: Span<&A>) -> Option<Range<A::Index>>;
1192-
...
11931215
}
11941216

11951217
fn rfind<H, P>(haystack: H, pattern: P) -> Option<H::Target::Index>
@@ -1205,7 +1227,6 @@ An alternative is to make Hay an associated type:
12051227
trait Searcher {
12061228
type Hay: Hay + ?Sized;
12071229
fn search(&mut self, span: Span<&Self::Hay>) -> Option<Range<Self::Hay::Index>>;
1208-
...
12091230
}
12101231

12111232
fn rfind<H, P>(haystack: H, pattern: P) -> Option<H::Target::Index>
@@ -1251,12 +1272,17 @@ With specialization, this dilemma can be easily fixed: we will fallback to an al
12511272
which only requires `T: PartialEq` (e.g. [`galil-seiferas`] or even naive search),
12521273
and use the faster Two-Way algorithm when `T: Ord`.
12531274

1254-
### Not having default implementations for `Searcher::{search, consume}`
1275+
### Not having default implementations for `search` and `consume`
12551276

1256-
In the `Searcher` trait, `.search()` and `.consume()` can be implemented in terms of each other:
1277+
In the `Searcher` and `Consumer` traits, `.search()` and `.consume()` can be implemented
1278+
in terms of each other:
12571279

12581280
```rust
1259-
trait Searcher<A: Hay + ?Sized> {
1281+
impl<A, C> Searcher<A> for C
1282+
where
1283+
A: Hay + ?Sized,
1284+
C: Consumer<A>,
1285+
{
12601286
fn search(&mut self, span: Span<&A>) -> Option<Range<A::Index>> {
12611287
// we can implement `search` in terms of `consume`
12621288
let (hay, range) = span.into_parts();
@@ -1272,7 +1298,13 @@ trait Searcher<A: Hay + ?Sized> {
12721298
}
12731299
}
12741300
}
1301+
}
12751302

1303+
impl<A, S> Consumer<A> for S
1304+
where
1305+
A: Hay + ?Sized,
1306+
S: Searcher<A>,
1307+
{
12761308
fn consume(&mut self, span: Span<&A>) -> Option<A::Index> {
12771309
// we can implement `consume` in terms of `search`
12781310
let start = span.original_range().start;
@@ -1283,8 +1315,6 @@ trait Searcher<A: Hay + ?Sized> {
12831315
None
12841316
}
12851317
}
1286-
1287-
...
12881318
}
12891319
```
12901320

@@ -1308,12 +1338,19 @@ where they should have full control of the details, we keep them as required met
13081338
`.next_match()` since it needs to take a span as input and thus no longer iterator-like.
13091339
It is renamed to `.search()` as a shorter verb and also consistent with the trait name.
13101340

1311-
* **Searcher::consume()**. The name is almost randomly chosen as there's no good name for
1341+
* **Consumer::consume()**. The name is almost randomly chosen as there's no good name for
13121342
this operation. This name is taken from the same function in the [`re2` library][re2-consume].
13131343

1344+
* `Consumer` is totally different from `Searcher`. Calling it `PrefixSearcher` or
1345+
`AnchoredSearcher` would imply a non-existing sub-classing relationship.
1346+
13141347
* We would also like a name which is only a single word.
13151348

1316-
* "match" (using name from Python) is incompatible with the existing `.matches()` method.
1349+
* We want the name *not* start with the letter **S**
1350+
so we could easily distinguish between this and `Searcher` when quick-scanning the code,
1351+
in particular when `ReverseXxxer` is involved.
1352+
1353+
* "Matcher" (using name from Python) is incompatible with the existing `.matches()` method.
13171354
Besides, the meaning of "match" is very ambiguous among other libraries.
13181355

13191356
<details><summary>Names from other languages and libraries</summary>
@@ -1636,7 +1673,8 @@ Unlike this RFC, the `Extract` class is much simpler.
16361673
the core type `&A` only, we could keep `SharedHaystack` unstable longer
16371674
(a separate track from the main Pattern API) until this question is resolved.
16381675

1639-
* With a benefit of type checking, we may still want to split `Consumer` from `Searcher`.
1676+
* With a benefit of simplified API,
1677+
we may want to merge `Consumer` and `Searcher` into a single trait.
16401678

16411679
[RFC 528]: https://github.com/rust-lang/rfcs/pull/528
16421680
[RFC 1309]: https://github.com/rust-lang/rfcs/pull/1309

0 commit comments

Comments
 (0)