@@ -253,27 +253,23 @@ to recover the indices in the middle (`5 == 3 + 2` and `7 == 3 + 4`).
253
253
254
254
### Searcher
255
255
256
- A searcher has two required methods ` .search() ` and ` .consume() ` ,
257
- and an optional method ` .trim_start() ` .
256
+ A searcher only provides a single method: ` .search() ` . It takes a span as input ,
257
+ and returns the first sub-range where the given pattern is found .
258
258
259
259
``` rust
260
260
pub unsafe trait Searcher <A : Hay + ? Sized > {
261
261
fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >>;
262
- fn consume (& mut self , span : Span <& A >) -> Option <A :: Index >;
263
- fn trim_start (& mut self , hay : & A ) -> A :: Index { ... }
264
262
}
265
263
266
264
pub unsafe trait ReverseSearcher <A : Hay + ? Sized >: Searcher <A > {
267
265
fn rsearch (& mut self , span : Span <& A >) -> Option <Range <A :: Index >>;
268
- fn rconsume (& mut self , span : Span <& A >) -> Option <A :: Index >;
269
- fn trim_end (& mut self , hay : & A ) -> A :: Index { ... }
270
266
}
271
267
272
268
pub unsafe trait DoubleEndedSearcher <A : Hay + ? Sized >: ReverseSearcher <A > {}
273
269
```
274
270
275
- ` .search() ` and ` .consume() ` are safe because there is no safe ways to construct a ` Span<&A> `
276
- with invalid ranges. Implementations of these methods often start with:
271
+ The ` .search() ` function is safe because there is no safe ways to construct a ` Span<&A> `
272
+ with invalid ranges. Implementations of ` .search() ` often start with:
277
273
278
274
``` rust
279
275
fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >> {
@@ -284,9 +280,27 @@ with invalid ranges. Implementations of these methods often start with:
284
280
285
281
The trait is unsafe to implement because it needs to guarantee the returned range is valid.
286
282
287
- The ` .search() ` method will look for the first slice matching the searcher's pattern in the span,
283
+ ### Consumer
284
+
285
+ A consumer provides the ` .consume() ` method to implement ` starts_with() ` and ` trim_start() ` . It
286
+ takes a span as input, and if the beginning matches the pattern, returns the end index of the match.
287
+
288
+ ``` rust
289
+ pub unsafe trait Consumer <A : Hay + ? Sized > {
290
+ fn consume (& mut self , span : Span <& A >) -> Option <A :: Index >;
291
+ }
292
+
293
+ pub unsafe trait ReverseConsumer <A : Hay + ? Sized >: Consumer <A > {
294
+ fn rconsume (& mut self , span : Span <& A >) -> Option <A :: Index >;
295
+ }
296
+
297
+ pub unsafe trait DoubleEndedConsumer <A : Hay + ? Sized >: ReverseConsumer <A > {}
298
+ ```
299
+
300
+ Comparing searcher and consumer, the ` .search() ` method will look for the first slice
301
+ matching the searcher's pattern in the span,
288
302
and returns the range where the slice is found (relative to the hay's start index).
289
- The ` .consume() ` method will is similar, but anchored to the start of the span.
303
+ The ` .consume() ` method is similar, but anchored to the start of the span.
290
304
291
305
``` rust
292
306
let span = unsafe { Span :: from_parts (" CDEFG" , 3 .. 8 ) };
@@ -310,9 +324,10 @@ A pattern is simply a "factory" of a searcher and consumer.
310
324
``` rust
311
325
trait Pattern <H : Haystack >: Sized {
312
326
type Searcher : Searcher <H :: Target >;
327
+ type Consumer : Consumer <H :: Target >;
313
328
314
329
fn into_searcher (self ) -> Self :: Searcher ;
315
- fn into_consumer (self ) -> Self :: Searcher { self . into_searcher () }
330
+ fn into_consumer (self ) -> Self :: Consumer ;
316
331
}
317
332
```
318
333
@@ -322,27 +337,55 @@ mutable state when implementing some more sophisticated string searching algorit
322
337
323
338
The relation between ` Pattern ` and ` Searcher ` is thus like ` IntoIterator ` and ` Iterator ` .
324
339
325
- There is a required method ` .into_searcher() ` as well as an optional method ` .into_consumer() ` .
340
+ There are two required methods ` .into_searcher() ` and ` .into_consumer() ` .
326
341
In some patterns (e.g. substring search), checking if a prefix match will require much less
327
- pre-computation than checking if any substring match. Therefore, if an algorithm can declare that
328
- it will only call ` .consume() ` , the searcher could use a more efficient structure.
342
+ pre-computation than checking if any substring match.
343
+ Therefore, a consumer could use a more efficient structure with this specialized purpose .
329
344
330
345
``` rust
331
346
impl <H : Haystack <Target = str >> Pattern <H > for & 'p str {
332
347
type Searcher = SliceSearcher <'p , [u8 ]>;
348
+ type Consumer = NaiveSearcher <'p , [u8 ]>;
333
349
#[inline]
334
350
fn into_searcher (self ) -> Self :: Searcher {
335
351
// create a searcher based on Two-Way algorithm.
336
- SliceSearcher :: new_searcher (self )
352
+ SliceSearcher :: new (self )
337
353
}
338
354
#[inline]
339
- fn into_consumer (self ) -> Self :: Searcher {
355
+ fn into_consumer (self ) -> Self :: Consumer {
340
356
// create a searcher based on naive search (which requires no pre-computation)
341
- SliceSearcher :: new_consumer (self )
357
+ NaiveSearcher :: new (self )
342
358
}
343
359
}
344
360
```
345
361
362
+ Note that, unlike ` IntoIterator ` , the standard library is unable to provide a blanket impl:
363
+
364
+ ``` rust
365
+ impl <H , S > Pattern <H > for S
366
+ where
367
+ H : Haystack ,
368
+ S : Searcher <H :: Target > + Consumer <H :: Target >,
369
+ {
370
+ type Searcher = Self ;
371
+ type Consumer = Self ;
372
+ fn into_searcher (self ) -> Self { self }
373
+ fn into_consumer (self ) -> Self { self }
374
+ }
375
+ ```
376
+
377
+ This is because there is already an existing Pattern impl:
378
+
379
+ ``` rust
380
+ impl <'h , F > Pattern <& 'h str > for F
381
+ where
382
+ F : FnMut (char ) -> bool ,
383
+ { ... }
384
+ ```
385
+
386
+ and a type can implement all of ` (FnMut(char) -> bool) + Searcher<str> + Consumer<str> ` ,
387
+ causing impl conflict.
388
+
346
389
### Algorithms
347
390
348
391
Standard algorithms are provided as * functions* in the ` core::pattern::ext ` module.
@@ -360,7 +403,7 @@ where
360
403
pub fn ends_with <H , P >(haystack : H , pattern : P ) -> bool
361
404
where
362
405
H : Haystack ,
363
- P : Pattern <H , Searcher : ReverseSearcher <H :: Target >>;
406
+ P : Pattern <H , Consumer : ReverseConsumer <H :: Target >>;
364
407
```
365
408
366
409
** Trim**
@@ -374,12 +417,12 @@ where
374
417
pub fn trim_end <H , P >(haystack : H , pattern : P ) -> H
375
418
where
376
419
H : Haystack ,
377
- P : Pattern <H , Searcher : ReverseSearcher <H :: Target >>;
420
+ P : Pattern <H , Consumer : ReverseConsumer <H :: Target >>;
378
421
379
422
pub fn trim <H , P >(haystack : H , pattern : P ) -> H
380
423
where
381
424
H : Haystack ,
382
- P : Pattern <H , Searcher : DoubleEndedSearcher <H :: Target >>;
425
+ P : Pattern <H , Consumer : DoubleEndedConsumer <H :: Target >>;
383
426
```
384
427
385
428
** Matches**
@@ -665,7 +708,7 @@ The main performance improvement comes from `trim()`. In v1.0, `trim()` depends
665
708
the `Searcher :: next_reject ()` method , which requires initializing a searcher and compute
666
709
the critical constants for the Two - Way search algorithm . Search algorithms mostly concern about
667
710
quickly skip through mismatches , but the purpose of `. next_reject ()` is to find mismatches , so a
668
- searcher would be a job mismatch for `trim ()`. This justifies the `. into_consumer ()` method in v3 . 0 .
711
+ searcher would be a job mismatch for `trim ()`. This justifies the `Consumer ` trait in v3 . 0 .
669
712
670
713
<details ><summary >Summary of benchmark </ summary >
671
714
@@ -717,7 +760,7 @@ searcher would be a job mismatch for `trim()`. This justifies the `.into_consume
717
760
718
761
[suffix table ]: https : // docs.rs/suffix/1.0.0/suffix/struct.SuffixTable.html#method.positions
719
762
720
- 2 . Patterns are still moved when converting to a Searcher .
763
+ 2 . Patterns are still moved when converting to a Searcher or Consumer .
721
764
Taking the entire ownership of the pattern might prevent some use cases... ?
722
765
723
766
* Stabilization of this RFC is blocked by [RFC 1672] \(disjointness based on associated types)
@@ -1087,26 +1130,6 @@ trait Consumer<A: Hay + ?Sized> {
1087
1130
Both ` starts_with() ` and ` trim() ` can be efficiently implemented in terms of ` .consume() ` ,
1088
1131
though for some patterns a specialized ` trim() ` can be even faster, so we keep this default method.
1089
1132
1090
- During the RFC, after we have actually tried the API on third party code, we found that having
1091
- ` Searcher ` and ` Consumer ` as two distinct traits seldom have any advantages as most of the time they
1092
- are the same type anyway. Therefore, we * merge* the consumer methods into the ` Searcher ` trait,
1093
- while still keeping ` Pattern::into_consumer() ` so we could still choose the less expensive algorithm
1094
- at runtime.
1095
-
1096
- ``` rust
1097
- // v3.0-alpha.8
1098
- trait Pattern <H : Haystack > {
1099
- type Searcher : Searcher <H :: Target >;
1100
- fn into_searcher (self ) -> Self :: Searcher ;
1101
- fn into_consumer (self ) -> Self :: Searcher { self . into_searcher () }
1102
- }
1103
- trait Searcher <A : Hay + ? Sized > {
1104
- fn search (& mut self , hay : Span <& A >) -> Option <Range <A :: Index >>;
1105
- fn consume (& mut self , hay : Span <& A >) -> Option <A :: Index >;
1106
- fn trim_start (& mut self , hay : & A ) -> A :: Index { /* default impl */ }
1107
- }
1108
- ```
1109
-
1110
1133
## Miscellaneous decisions
1111
1134
1112
1135
### ` usize ` as index instead of pointers
@@ -1183,13 +1206,12 @@ And thus the more general `Borrow` trait offers no advantage over `Deref`.
1183
1206
1184
1207
### Searcher makes Hay an input type instead of associated type
1185
1208
1186
- The ` Searcher ` trait makes the hay as input type.
1209
+ The ` Searcher ` and ` Consumer ` traits makes the hay as input type.
1187
1210
This makes any algorithm relying on a ` ReverseSearcher ` need to spell out the hay as well.
1188
1211
1189
1212
``` rust
1190
1213
trait Searcher <A : Hay + ? Sized > {
1191
1214
fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >>;
1192
- ...
1193
1215
}
1194
1216
1195
1217
fn rfind <H , P >(haystack : H , pattern : P ) -> Option <H :: Target :: Index >
@@ -1205,7 +1227,6 @@ An alternative is to make Hay an associated type:
1205
1227
trait Searcher {
1206
1228
type Hay : Hay + ? Sized ;
1207
1229
fn search (& mut self , span : Span <& Self :: Hay >) -> Option <Range <Self :: Hay :: Index >>;
1208
- ...
1209
1230
}
1210
1231
1211
1232
fn rfind <H , P >(haystack : H , pattern : P ) -> Option <H :: Target :: Index >
@@ -1251,12 +1272,17 @@ With specialization, this dilemma can be easily fixed: we will fallback to an al
1251
1272
which only requires ` T: PartialEq ` (e.g. [ ` galil-seiferas ` ] or even naive search),
1252
1273
and use the faster Two-Way algorithm when ` T: Ord ` .
1253
1274
1254
- ### Not having default implementations for ` Searcher::{ search, consume} `
1275
+ ### Not having default implementations for ` search ` and ` consume `
1255
1276
1256
- In the ` Searcher ` trait, ` .search() ` and ` .consume() ` can be implemented in terms of each other:
1277
+ In the ` Searcher ` and ` Consumer ` traits, ` .search() ` and ` .consume() ` can be implemented
1278
+ in terms of each other:
1257
1279
1258
1280
``` rust
1259
- trait Searcher <A : Hay + ? Sized > {
1281
+ impl <A , C > Searcher <A > for C
1282
+ where
1283
+ A : Hay + ? Sized ,
1284
+ C : Consumer <A >,
1285
+ {
1260
1286
fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >> {
1261
1287
// we can implement `search` in terms of `consume`
1262
1288
let (hay , range ) = span . into_parts ();
@@ -1272,7 +1298,13 @@ trait Searcher<A: Hay + ?Sized> {
1272
1298
}
1273
1299
}
1274
1300
}
1301
+ }
1275
1302
1303
+ impl <A , S > Consumer <A > for S
1304
+ where
1305
+ A : Hay + ? Sized ,
1306
+ S : Searcher <A >,
1307
+ {
1276
1308
fn consume (& mut self , span : Span <& A >) -> Option <A :: Index > {
1277
1309
// we can implement `consume` in terms of `search`
1278
1310
let start = span . original_range (). start;
@@ -1283,8 +1315,6 @@ trait Searcher<A: Hay + ?Sized> {
1283
1315
None
1284
1316
}
1285
1317
}
1286
-
1287
- ...
1288
1318
}
1289
1319
```
1290
1320
@@ -1308,12 +1338,19 @@ where they should have full control of the details, we keep them as required met
1308
1338
` .next_match() ` since it needs to take a span as input and thus no longer iterator-like.
1309
1339
It is renamed to ` .search() ` as a shorter verb and also consistent with the trait name.
1310
1340
1311
- * ** Searcher ::consume()** . The name is almost randomly chosen as there's no good name for
1341
+ * ** Consumer ::consume()** . The name is almost randomly chosen as there's no good name for
1312
1342
this operation. This name is taken from the same function in the [ ` re2 ` library] [ re2-consume ] .
1313
1343
1344
+ * ` Consumer ` is totally different from ` Searcher ` . Calling it ` PrefixSearcher ` or
1345
+ ` AnchoredSearcher ` would imply a non-existing sub-classing relationship.
1346
+
1314
1347
* We would also like a name which is only a single word.
1315
1348
1316
- * "match" (using name from Python) is incompatible with the existing ` .matches() ` method.
1349
+ * We want the name * not* start with the letter ** S**
1350
+ so we could easily distinguish between this and ` Searcher ` when quick-scanning the code,
1351
+ in particular when ` ReverseXxxer ` is involved.
1352
+
1353
+ * "Matcher" (using name from Python) is incompatible with the existing ` .matches() ` method.
1317
1354
Besides, the meaning of "match" is very ambiguous among other libraries.
1318
1355
1319
1356
<details ><summary >Names from other languages and libraries</summary >
@@ -1636,7 +1673,8 @@ Unlike this RFC, the `Extract` class is much simpler.
1636
1673
the core type `& A ` only , we could keep `SharedHaystack ` unstable longer
1637
1674
(a separate track from the main Pattern API ) until this question is resolved .
1638
1675
1639
- * With a benefit of type checking , we may still want to split `Consumer ` from `Searcher `.
1676
+ * With a benefit of simplified API ,
1677
+ we may want to merge `Consumer ` and `Searcher ` into a single trait .
1640
1678
1641
1679
[RFC 528 ]: https : // github.com/rust-lang/rfcs/pull/528
1642
1680
[RFC 1309 ]: https : // github.com/rust-lang/rfcs/pull/1309
0 commit comments