@@ -24,6 +24,7 @@ Generalize the pattern API to support `&str`, `&mut str`, `&[T]`, `&mut [T]`, `V
24
24
- [ Principles] ( #principles )
25
25
- [ Design rationales] ( #design-rationales )
26
26
- [ Miscellaneous decisions] ( #miscellaneous-decisions )
27
+ - [ Alternatives] ( #alternatives )
27
28
- [ Prior art] ( #prior-art )
28
29
- [ Previous attempts] ( #previous-attempts )
29
30
- [ Haskell] ( #haskell )
@@ -245,21 +246,27 @@ to recover the indices in the middle (`5 == 3 + 2` and `7 == 3 + 4`).
245
246
246
247
### Searcher
247
248
248
- A searcher only provides a single method: ` .search() ` . It takes a span as input ,
249
- and returns the first sub-range where the given pattern is found .
249
+ A searcher has two required methods ` .search() ` and ` .consume() ` ,
250
+ and an optional method ` .trim_start() ` .
250
251
251
252
``` rust
252
253
pub unsafe trait Searcher <A : Hay + ? Sized > {
253
254
fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >>;
255
+ fn consume (& mut self , span : Span <& A >) -> Option <A :: Index >;
256
+ fn trim_start (& mut self , hay : & A ) -> A :: Index { ... }
254
257
}
258
+
255
259
pub unsafe trait ReverseSearcher <A : Hay + ? Sized >: Searcher <A > {
256
260
fn rsearch (& mut self , span : Span <& A >) -> Option <Range <A :: Index >>;
261
+ fn rconsume (& mut self , span : Span <& A >) -> Option <A :: Index >;
262
+ fn trim_end (& mut self , hay : & A ) -> A :: Index { ... }
257
263
}
264
+
258
265
pub unsafe trait DoubleEndedSearcher <A : Hay + ? Sized >: ReverseSearcher <A > {}
259
266
```
260
267
261
- The ` .search() ` function is safe because there is no safe ways to construct a ` Span<&A> `
262
- with invalid ranges. Implementations of ` .search() ` often start with:
268
+ ` .search() ` and ` .consume() ` are safe because there is no safe ways to construct a ` Span<&A> `
269
+ with invalid ranges. Implementations of these methods often start with:
263
270
264
271
``` rust
265
272
fn search (& mut self , span : SharedSpan <& A >) -> Option <Range <A :: Index >> {
@@ -270,36 +277,62 @@ with invalid ranges. Implementations of `.search()` often start with:
270
277
271
278
The trait is unsafe to implement because it needs to guarantee the returned range is valid.
272
279
273
- ### Consumer
274
-
275
- A consumer provides the ` .consume() ` method to implement ` starts_with() ` and ` trim_start() ` . It
276
- takes a span as input, and if the beginning matches the pattern, returns the end index of the match.
277
-
278
- The trait also provides a ` .trim_start() ` method in case a faster specialization exists.
280
+ The ` .search() ` method will look for the first slice matching the searcher's pattern in the span,
281
+ and returns the range where the slice is found (relative to the hay's start index).
282
+ The ` .consume() ` method will is similar, but anchored to the start of the span.
279
283
280
284
``` rust
281
- pub unsafe trait Consumer <A : Hay + ? Sized > {
282
- fn consume (& mut self , span : Span <& A >) -> Option <A :: Index >;
283
- fn trim_start (& mut self , hay : & A ) -> A :: Index { ... }
284
- }
285
- pub unsafe trait ReverseConsumer <A : Hay + ? Sized >: Consumer <A > {
286
- fn rconsume (& mut self , span : Span <& A >) -> Option <A :: Index >;
287
- fn trim_end (& mut self , hay : & A ) -> A :: Index { ... }
288
- }
289
- pub unsafe trait DoubleEndedConsumer <A : Hay + ? Sized >: ReverseConsumer <A > {}
285
+ let span = unsafe { Span :: from_parts (" CDEFG" , 3 .. 8 ) };
286
+ // we can find "CD" at the start of the span.
287
+ assert_eq! (" CD" . into_searcher (). search (span . clone ()), Some (3 .. 5 ));
288
+ assert_eq! (" CD" . into_searcher (). consume (span . clone ()), Some (5 ));
289
+ // we can only find "EF" in the middle of the span.
290
+ assert_eq! (" EF" . into_searcher (). search (span . clone ()), Some (5 .. 7 ));
291
+ assert_eq! (" EF" . into_searcher (). consume (span . clone ()), None );
292
+ // we cannot find "GH" in the span.
293
+ assert_eq! (" GH" . into_searcher (). search (span . clone ()), None );
294
+ assert_eq! (" GH" . into_searcher (). consume (span . clone ()), None );
290
295
```
291
296
297
+ The trait also provides a ` .trim_start() ` method in case a faster specialization exists.
298
+
292
299
### Pattern
293
300
294
301
A pattern is simply a "factory" of a searcher and consumer.
295
302
296
303
``` rust
297
304
trait Pattern <H : Haystack >: Sized {
298
305
type Searcher : Searcher <H :: Target >;
299
- type Consumer : Consumer <H :: Target >;
300
306
301
307
fn into_searcher (self ) -> Self :: Searcher ;
302
- fn into_consumer (self ) -> Self :: Consumer ;
308
+ fn into_consumer (self ) -> Self :: Searcher { self . into_searcher () }
309
+ }
310
+ ```
311
+
312
+ Patterns are the types where users used to supply into the algorithms.
313
+ Patterns are usually immutable (stateless), while searchers sometimes require pre-computation and
314
+ mutable state when implementing some more sophisticated string searching algorithms.
315
+
316
+ The relation between ` Pattern ` and ` Searcher ` is thus like ` IntoIterator ` and ` Iterator ` .
317
+
318
+ There is a required method ` .into_searcher() ` as well as an optional method ` .into_consumer() ` .
319
+ In some patterns (e.g. substring search), checking if a prefix match will require much less
320
+ pre-computation than checking if any substring match. Therefore, if an algorithm can declare that
321
+ it will only call ` .consume() ` , the searcher could use a more efficient structure.
322
+
323
+ ``` rust
324
+ impl <H : Haystack <Target = str >> Pattern <H > for & 'p str {
325
+ type Searcher = SliceSearcher <'p , [u8 ]>;
326
+ #[inline]
327
+ fn into_searcher (self ) -> Self :: Searcher {
328
+ // create a searcher based on Two-Way algorithm.
329
+ SliceSearcher :: new_searcher (self )
330
+ }
331
+ #[inline]
332
+ fn into_consumer (self ) -> Self :: Searcher {
333
+ // create a searcher based on naive search (which requires no pre-computation)
334
+ SliceSearcher :: new_consumer (self )
335
+ }
303
336
}
304
337
```
305
338
@@ -320,7 +353,7 @@ where
320
353
pub fn ends_with <H , P >(haystack : H , pattern : P ) -> bool
321
354
where
322
355
H : Haystack ,
323
- P : Pattern <H , Consumer : ReverseConsumer <H :: Target >>;
356
+ P : Pattern <H , Searcher : ReverseSearcher <H :: Target >>;
324
357
```
325
358
326
359
** Trim**
@@ -334,12 +367,12 @@ where
334
367
pub fn trim_end <H , P >(haystack : H , pattern : P ) -> H
335
368
where
336
369
H : Haystack ,
337
- P : Pattern <H , Consumer : ReverseConsumer <H :: Target >>;
370
+ P : Pattern <H , Searcher : ReverseSearcher <H :: Target >>;
338
371
339
372
pub fn trim <H , P >(haystack : H , pattern : P ) -> H
340
373
where
341
374
H : Haystack ,
342
- P : Pattern <H , Consumer : DoubleEndedConsumer <H :: Target >>;
375
+ P : Pattern <H , Searcher : DoubleEndedSearcher <H :: Target >>;
343
376
```
344
377
345
378
** Matches**
@@ -625,7 +658,7 @@ The main performance improvement comes from `trim()`. In v1.0, `trim()` depends
625
658
the `Searcher :: next_reject ()` method , which requires initializing a searcher and compute
626
659
the critical constants for the Two - Way search algorithm . Search algorithms mostly concern about
627
660
quickly skip through mismatches , but the purpose of `. next_reject ()` is to find mismatches , so a
628
- searcher would be a job mismatch for `trim ()`. This justifies the `Consumer ` trait in v3 . 0 .
661
+ searcher would be a job mismatch for `trim ()`. This justifies the `. into_consumer ()` method in v3 . 0 .
629
662
630
663
<details ><summary >Summary of benchmark </ summary >
631
664
@@ -677,8 +710,72 @@ searcher would be a job mismatch for `trim()`. This justifies the `Consumer` tra
677
710
678
711
[suffix table ]: https : // docs.rs/suffix/1.0.0/suffix/struct.SuffixTable.html#method.positions
679
712
680
- 2 . Patterns are still moved when converting to a Searcher or Consumer .
681
- Taking the entire ownership might prevent some use cases... ?
713
+ 2 . Patterns are still moved when converting to a Searcher .
714
+ Taking the entire ownership of the pattern might prevent some use cases... ?
715
+
716
+ * Stabilization of this RFC is blocked by [RFC 1672] \(disjointness based on associated types)
717
+ which is postponed.
718
+
719
+ The default Pattern implementation currently uses an impl that covers all haystacks
720
+ (`impl <H : Haystack <Target = A >> Pattern <H > for Pat `) for some types, and several impls for
721
+ individual types for others (`impl <'h> Pattern <& 'h A > for Pat `). Ideally * every* such impl
722
+ should use the blanket impl .
723
+ Unfortunately , due to lack of RFC 1672, there would be conflict between these impls:
724
+
725
+ ```rust
726
+ // 1.
727
+ impl <'p, H > Pattern <H > for & 'p [char ]
728
+ where
729
+ H : Haystack <Target = str >,
730
+ { ... }
731
+ impl <'p, H > Pattern <H > for & 'p [T ] // `T` can be `char`
732
+ where
733
+ H : Haystack <Target = [T ]>,
734
+ T : PartialEq + 'p,
735
+ { ... }
736
+
737
+ // 2.
738
+ impl <H , F > Pattern <H > for F
739
+ where
740
+ H : Haystack <Target = str >,
741
+ F : FnMut (char ) -> bool ,
742
+ { ... }
743
+ impl <T , H , F > Pattern <H > for F
744
+ where
745
+ H : Haystack <Target = [T ]>,
746
+ F : FnMut (& T ) -> bool , // `F` can impl both `FnMut(char)->bool` and `FnMut(&T)->bool`.
747
+ T : PartialEq ,
748
+ { ... }
749
+
750
+ // 3.
751
+ impl <'p, H > Pattern <H > for & 'p str
752
+ where
753
+ H : Haystack <Target = str >,
754
+ { ... }
755
+ impl <'p, H > Pattern <H > for & 'p str
756
+ where
757
+ H : Haystack <Target = OsStr >,
758
+ { ... }
759
+ ```
760
+
761
+ We currently provide concrete impls like `impl <'h, 'p> Pattern <& 'h OsStr > for & 'p str `
762
+ as workaround, but if we stabilize the `Pattern ` trait before RFC 1672 is implemented,
763
+ a third- party crate can sneak in an impl :
764
+
765
+ ```rust
766
+ struct MyOsString { ... };
767
+ impl Deref for MyOsString {
768
+ type Target = OsStr ;
769
+ ...
770
+ }
771
+ impl Haystack for MyOsString { ... }
772
+
773
+ impl <'p > Pattern <MyOsString > for & 'p str { ... }
774
+ ```
775
+
776
+ and causes the standard library not able to further generalize (this is a breaking change ).
777
+
778
+ RFC 1672 is currently blocked by `chalk ` integration before it could be reopened .
682
779
683
780
# Rationale and alternatives
684
781
[alternatives ]: #alternatives
@@ -983,6 +1080,26 @@ trait Consumer<A: Hay + ?Sized> {
983
1080
Both ` starts_with() ` and ` trim() ` can be efficiently implemented in terms of ` .consume() ` ,
984
1081
though for some patterns a specialized ` trim() ` can be even faster, so we keep this default method.
985
1082
1083
+ During the RFC, after we have actually tried the API on third party code, we found that having
1084
+ ` Searcher ` and ` Consumer ` as two distinct traits seldom have any advantages as most of the time they
1085
+ are the same type anyway. Therefore, we * merge* the consumer methods into the ` Searcher ` trait,
1086
+ while still keeping ` Pattern::into_consumer() ` so we could still choose the less expensive algorithm
1087
+ at runtime.
1088
+
1089
+ ``` rust
1090
+ // v3.0-alpha.8
1091
+ trait Pattern <H : Haystack > {
1092
+ type Searcher : Searcher <H :: Target >;
1093
+ fn into_searcher (self ) -> Self :: Searcher ;
1094
+ fn into_consumer (self ) -> Self :: Searcher { self . into_searcher () }
1095
+ }
1096
+ trait Searcher <A : Hay + ? Sized > {
1097
+ fn search (& mut self , hay : Span <& A >) -> Option <Range <A :: Index >>;
1098
+ fn consume (& mut self , hay : Span <& A >) -> Option <A :: Index >;
1099
+ fn trim_start (& mut self , hay : & A ) -> A :: Index { /* default impl */ }
1100
+ }
1101
+ ```
1102
+
986
1103
## Miscellaneous decisions
987
1104
988
1105
### ` usize ` as index instead of pointers
@@ -1059,12 +1176,13 @@ And thus the more general `Borrow` trait offers no advantage over `Deref`.
1059
1176
1060
1177
### Searcher makes Hay an input type instead of associated type
1061
1178
1062
- The ` Searcher ` and ` Consumer ` traits make the hay as input type.
1179
+ The ` Searcher ` trait makes the hay as input type.
1063
1180
This makes any algorithm relying on a ` ReverseSearcher ` need to spell out the hay as well.
1064
1181
1065
1182
``` rust
1066
1183
trait Searcher <A : Hay + ? Sized > {
1067
1184
fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >>;
1185
+ ...
1068
1186
}
1069
1187
1070
1188
fn rfind <H , P >(haystack : H , pattern : P ) -> Option <H :: Target :: Index >
@@ -1080,6 +1198,7 @@ An alternative is to make Hay an associated type:
1080
1198
trait Searcher {
1081
1199
type Hay : Hay + ? Sized ;
1082
1200
fn search (& mut self , span : Span <& Self :: Hay >) -> Option <Range <Self :: Hay :: Index >>;
1201
+ ...
1083
1202
}
1084
1203
1085
1204
fn rfind <H , P >(haystack : H , pattern : P ) -> Option <H :: Target :: Index >
@@ -1125,6 +1244,47 @@ With specialization, this dilemma can be easily fixed: we will fallback to an al
1125
1244
which only requires ` T: PartialEq ` (e.g. [ ` galil-seiferas ` ] or even naive search),
1126
1245
and use the faster Two-Way algorithm when ` T: Ord ` .
1127
1246
1247
+ ### Not having default implementations for ` Searcher::{search, consume} `
1248
+
1249
+ In the ` Searcher ` trait, ` .search() ` and ` .consume() ` can be implemented in terms of each other:
1250
+
1251
+ ``` rust
1252
+ trait Searcher <A : Hay + ? Sized > {
1253
+ fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >> {
1254
+ // we can implement `search` in terms of `consume`
1255
+ let (hay , range ) = span . into_parts ();
1256
+ loop {
1257
+ unsafe {
1258
+ if let Some (end ) = self . consume (Span :: from_span (hay , range . clone ())) {
1259
+ return Some (range . start.. end );
1260
+ }
1261
+ if range . start == range . end {
1262
+ return None ;
1263
+ }
1264
+ range . start = hay . next_index (range . start);
1265
+ }
1266
+ }
1267
+ }
1268
+
1269
+ fn consume (& mut self , span : Span <& A >) -> Option <A :: Index > {
1270
+ // we can implement `consume` in terms of `search`
1271
+ let start = span . original_range (). start;
1272
+ let range = self . search (span )? ;
1273
+ if range . start == start {
1274
+ Some (range . end)
1275
+ } else {
1276
+ None
1277
+ }
1278
+ }
1279
+
1280
+ ...
1281
+ }
1282
+ ```
1283
+
1284
+ These fallbacks should only be used when the pattern does not allow more efficient implementations,
1285
+ which is often not the case. To encourage pattern implementations to support both primitives,
1286
+ where they should have full control of the details, we keep them as required methods.
1287
+
1128
1288
### Names of everything
1129
1289
1130
1290
* ** Haystack** . Inherited from the v1.0 method ` Searcher::haystack() ` . v2.0 called it
@@ -1141,19 +1301,12 @@ and use the faster Two-Way algorithm when `T: Ord`.
1141
1301
` .next_match() ` since it needs to take a span as input and thus no longer iterator-like.
1142
1302
It is renamed to ` .search() ` as a shorter verb and also consistent with the trait name.
1143
1303
1144
- * ** Consumer ::consume()** . The name is almost randomly chosen as there's no good name for
1304
+ * ** Searcher ::consume()** . The name is almost randomly chosen as there's no good name for
1145
1305
this operation. This name is taken from the same function in the [ ` re2 ` library] [ re2-consume ] .
1146
1306
1147
- * ` Consumer ` is totally different from ` Searcher ` . Calling it ` PrefixSearcher ` or
1148
- ` AnchoredSearcher ` would imply a non-existing sub-classing relationship.
1149
-
1150
1307
* We would also like a name which is only a single word.
1151
1308
1152
- * We want the name * not* start with the letter ** S**
1153
- so we could easily distinguish between this and ` Searcher ` when quick-scanning the code,
1154
- in particular when ` ReverseXxxer ` is involved.
1155
-
1156
- * "Matcher" (using name from Python) is incompatible with the existing ` .matches() ` method.
1309
+ * "match" (using name from Python) is incompatible with the existing ` .matches() ` method.
1157
1310
Besides, the meaning of "match" is very ambiguous among other libraries.
1158
1311
1159
1312
<details ><summary >Names from other languages and libraries</summary >
@@ -1215,6 +1368,10 @@ and use the faster Two-Way algorithm when `T: Ord`.
1215
1368
1216
1369
* ** Span** . The name is taken from the rustc compiler.
1217
1370
1371
+ ## Alternatives
1372
+
1373
+ * The names of everything except ` Searcher ` , ` Pattern ` and ` Haystack ` are not finalized.
1374
+
1218
1375
# Prior art
1219
1376
1220
1377
## Previous attempts
@@ -1474,6 +1631,7 @@ Unlike this RFC, the `Extract` class is much simpler.
1474
1631
1475
1632
[RFC 528 ]: https : // github.com/rust-lang/rfcs/pull/528
1476
1633
[RFC 1309 ]: https : // github.com/rust-lang/rfcs/pull/1309
1634
+ [RFC 1672 ]: https : // github.com/rust-lang/rfcs/pull/1672
1477
1635
[RFC 2089 ]: https : // github.com/rust-lang/rfcs/pull/2089
1478
1636
[RFC 2289 ]: https : // github.com/rust-lang/rfcs/pull/2289
1479
1637
[RFC 2295 ]: https : // github.com/rust-lang/rfcs/pull/2295
0 commit comments