|
| 1 | +# Supported regular expression features |
| 2 | + |
| 3 | +The listings are taken from the [Perl regex docs](https://perldoc.perl.org/perlre). Regular expressions are applied via the [`regex` metafunction](../cpp2/metafunctions.md#regex). |
| 4 | + |
| 5 | + |
| 6 | +## Currently supported or planned features |
| 7 | + |
| 8 | + |
| 9 | +### Modifiers |
| 10 | + |
| 11 | +| Modifier | Notes | Status | |
| 12 | +| --- | --- | --- | |
| 13 | +| **`i`** | Do case-insensitive pattern matching. For example, "A" will match "a" under `/i`. | <span style="color:green">Supported</span> | |
| 14 | +| **`m`** | Treat the string being matched against as multiple lines. That is, change `^` and `$` from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string. | <span style="color:green">Supported</span> | |
| 15 | +| **`s`** | Treat the string as single line. That is, change `.` to match any character whatsoever, even a newline, which normally it would not match. | <span style="color:green">Supported</span> | |
| 16 | +| ***`x` and `xx`** | Extend your pattern's legibility by permitting whitespace and comments. For details see: [Perl regex docs: `/x` and `/xx`](https://perldoc.perl.org/perlre#/x-and-/xx). | <span style="color:green">Supported</span> | |
| 17 | +| **`n`** | Prevent the grouping metacharacters `(` and `)` from capturing. This modifier will stop `$1`, `$2`, etc. from being filled in. | <span style="color:green">Supported</span> | |
| 18 | +| **`c`** | Keep the current position during repeated matching. | <span style="color:gray">Planned</span> | |
| 19 | + |
| 20 | + |
| 21 | +### Escape sequences __(Complete)__ |
| 22 | + |
| 23 | +| Escape sequence | Notes | Status | |
| 24 | +| --- | --- | --- | |
| 25 | +| **`\t`** | Tab (HT, TAB)X | <span style="color:green">Supported</span> | |
| 26 | +| **`\n`** | Newline (LF, NL) | <span style="color:green">Supported</span> | |
| 27 | +| **`\r`** | Return (CR) | <span style="color:green">Supported</span> | |
| 28 | +| **`\f`** | Form feed (FF) | <span style="color:green">Supported</span> | |
| 29 | +| **`\a`** | Alarm (bell) (BEL) | <span style="color:green">Supported</span> | |
| 30 | +| **`\e`** | Escape (think troff) (ESC) | <span style="color:green">Supported</span> | |
| 31 | +| **`\x{}`, `\x00`** | Character whose ordinal is the given hexadecimal number | <span style="color:green">Supported</span> | |
| 32 | +| **`\o{}`, `\000`** | Character whose ordinal is the given octal number | <span style="color:green">Supported</span> | |
| 33 | + |
| 34 | + |
| 35 | +### Quantifiers __(Complete)__ |
| 36 | + |
| 37 | +| Quantifier | Notes | Status | |
| 38 | +| --- | --- | --- | |
| 39 | +| **`*`** | Match 0 or more times | <span style="color:green">Supported</span> | |
| 40 | +| **`+`** | Match 1 or more times | <span style="color:green">Supported</span> | |
| 41 | +| **`?`** | Match 1 or 0 times | <span style="color:green">Supported</span> | |
| 42 | +| **`{n}`** | Match exactly n times | <span style="color:green">Supported</span> | |
| 43 | +| **`{n,}`** | Match at least n times | <span style="color:green">Supported</span> | |
| 44 | +| **`{,n}`** | Match at most n times | <span style="color:green">Supported</span> | |
| 45 | +| **`{n,m}`** | Match at least n but not more than m times | <span style="color:green">Supported</span> | |
| 46 | +| | | | |
| 47 | +| **`*?`** | Match 0 or more times, not greedily | <span style="color:green">Supported</span> | |
| 48 | +| **`+?`** | Match 1 or more times, not greedily | <span style="color:green">Supported</span> | |
| 49 | +| **`??`** | Match 0 or 1 time, not greedily | <span style="color:green">Supported</span> | |
| 50 | +| **`{n}?`** | Match exactly n times, not greedily (redundant) | <span style="color:green">Supported</span> | |
| 51 | +| **`{n,}?`** | Match at least n times, not greedily | <span style="color:green">Supported</span> | |
| 52 | +| **`{,n}?`** | Match at most n times, not greedily | <span style="color:green">Supported</span> | |
| 53 | +| **`{n,m}?`** | Match at least n but not more than m times, not greedily | <span style="color:green">Supported</span> | |
| 54 | +| | | | |
| 55 | +| **`*+`** | Match 0 or more times and give nothing back | <span style="color:green">Supported</span> | |
| 56 | +| **`++`** | Match 1 or more times and give nothing back | <span style="color:green">Supported</span> | |
| 57 | +| **`?+`** | Match 0 or 1 time and give nothing back | <span style="color:green">Supported</span> | |
| 58 | +| **`{n}+`** | Match exactly n times and give nothing back (redundant) | <span style="color:green">Supported</span> | |
| 59 | +| **`{n,}+`** | Match at least n times and give nothing back | <span style="color:green">Supported</span> | |
| 60 | +| **`{,n}+`** | Match at most n times and give nothing back | <span style="color:green">Supported</span> | |
| 61 | +| **`{n,m}+`** | Match at least n but not more than m times and give nothing back | <span style="color:green">Supported</span> | |
| 62 | + |
| 63 | + |
| 64 | +### Character Classes and other Special Escapes __(Complete)__ |
| 65 | + |
| 66 | +| Feature | Notes | Status | |
| 67 | +| --- | --- | --- | |
| 68 | +| **`[`...`]`** | Match a character according to the rules of the bracketed character class defined by the "...". Example: `[a-z]` matches "a" or "b" or "c" ... or "z" | <span style="color:green">Supported</span> | |
| 69 | +| **`[[:`...`:]]`** | Match a character according to the rules of the POSIX character class "..." within the outer bracketed character class. Example: `[[:upper:]]` matches any uppercase character. | <span style="color:green">Supported</span> | |
| 70 | +| **`\g1`** or **`\g{-1}`** | Backreference to a specific or previous group. The number may be negative indicating a relative previous group and may optionally be wrapped in curly brackets for safer parsing. | <span style="color:green">Supported</span> | |
| 71 | +| **`\g{name}`** | Named backreference | <span style="color:green">Supported</span> | |
| 72 | +| **`\k<name>`** | Named backreference | <span style="color:green">Supported</span> | |
| 73 | +| **`\k'name'`** | Named backreference | <span style="color:green">Supported</span> | |
| 74 | +| **`\k{name}`** | Named backreference | <span style="color:green">Supported</span> | |
| 75 | +| **`\w`** | Match a "word" character (alphanumeric plus "_", plus other connector punctuation chars plus Unicode marks) | <span style="color:green">Supported</span> | |
| 76 | +| **`\W`** | Match a non-"word" character | <span style="color:green">Supported</span> | |
| 77 | +| **`\s`** | Match a whitespace character | <span style="color:green">Supported</span> | |
| 78 | +| **`\S`** | Match a non-whitespace character | <span style="color:green">Supported</span> | |
| 79 | +| **`\d`** | Match a decimal digit character | <span style="color:green">Supported</span> | |
| 80 | +| **`\D`** | Match a non-digit character | <span style="color:green">Supported</span> | |
| 81 | +| **`\v`** | Vertical whitespace | <span style="color:green">Supported</span> | |
| 82 | +| **`\V`** | Not vertical whitespace | <span style="color:green">Supported</span> | |
| 83 | +| **`\h`** | Horizontal whitespace | <span style="color:green">Supported</span> | |
| 84 | +| **`\H`** | Not horizontal whitespace | <span style="color:green">Supported</span> | |
| 85 | +| **`\1`** | Backreference to a specific capture group or buffer. '1' may actually be any positive integer. | <span style="color:green">Supported</span> | |
| 86 | +| **`\N`** | Any character but \n. Not affected by /s modifier | <span style="color:green">Supported</span> | |
| 87 | +| **`\K`** | Keep the stuff left of the \K, don't include it in $& | <span style="color:green">Supported</span> | |
| 88 | + |
| 89 | + |
| 90 | +### Assertions |
| 91 | + |
| 92 | +| Assertion | Notes | Status | |
| 93 | +| --- | --- | --- | |
| 94 | +| **`\b`** | Match a \w\W or \W\w boundary | <span style="color:green">Supported</span> | |
| 95 | +| **`\B`** | Match except at a \w\W or \W\w boundary | <span style="color:green">Supported</span> | |
| 96 | +| **`\A`** | Match only at beginning of string | <span style="color:green">Supported</span> | |
| 97 | +| **`\Z`** | Match only at end of string, or before newline at the end | <span style="color:green">Supported</span> | |
| 98 | +| **`\z`** | Match only at end of string | <span style="color:green">Supported</span> | |
| 99 | +| **`\G`** | Match only at pos() (e.g. at the end-of-match position of prior m//g) | <span style="color:gray">Planned</span> | |
| 100 | + |
| 101 | + |
| 102 | +### Capture groups __(Complete)__ |
| 103 | + |
| 104 | +| Feature | Status | |
| 105 | +| --- | --- | |
| 106 | +| **`(`...`)`** | <span style="color:green">Supported</span> | |
| 107 | + |
| 108 | + |
| 109 | +### Quoting metacharacters __(Complete)__ |
| 110 | + |
| 111 | +| Feature | Status | |
| 112 | +| --- | --- | |
| 113 | +| **For `^.[]$()*{}?+|\`** | <span style="color:green">Supported</span> | |
| 114 | + |
| 115 | + |
| 116 | +### Extended Patterns |
| 117 | + |
| 118 | +| Extended pattern | Notes | Status | |
| 119 | +| --- | --- | --- | |
| 120 | +| **`(?<NAME>pattern)`** | Named capture group | <span style="color:green">Supported</span> | |
| 121 | +| **`(?#text)`** | Comments | <span style="color:green">Supported</span> | |
| 122 | +| **`(?adlupimnsx-imnsx)`** | Modification for surrounding context | <span style="color:green">Supported</span> | |
| 123 | +| **`(?^alupimnsx)`** | Modification for surrounding context | <span style="color:green">Supported</span> | |
| 124 | +| **`(?:pattern)`** | Clustering, does not generate a group index. | <span style="color:green">Supported</span> | |
| 125 | +| **`(?adluimnsx-imnsx:pattern)`** | Clustering, does not generate a group index and modifications for the cluster. | <span style="color:green">Supported</span> | |
| 126 | +| **`(?^aluimnsx:pattern)`** | Clustering, does not generate a group index and modifications for the cluster. | <span style="color:green">Supported</span> | |
| 127 | +| **`(?`<code>|</code>`pattern)`** | Branch reset | <span style="color:green">Supported</span> | |
| 128 | +| **`(?'NAME'pattern)`** | Named capture group | <span style="color:green">Supported</span> | |
| 129 | +| **`(?(condition)yes-pattern`<code>|</code>`no-pattern)`** | Conditional patterns. | <span style="color:gray">Planned</span> | |
| 130 | +| **`(?(condition)yes-pattern)`** | Conditional patterns. | <span style="color:gray">Planned</span> | |
| 131 | +| **`(?>pattern)`** | Atomic patterns. (Disable backtrack.) | <span style="color:gray">Planned</span> | |
| 132 | +| **`(*atomic:pattern)`** | Atomic patterns. (Disable backtrack.) | <span style="color:gray">Planned</span> | |
| 133 | + |
| 134 | + |
| 135 | +### Lookaround Assertions |
| 136 | + |
| 137 | +| Lookaround assertion | Notes | Status | |
| 138 | +| --- | --- | --- | |
| 139 | +| **`(?=pattern)`** | Positive look ahead. | <span style="color:green">Supported</span> | |
| 140 | +| **`(*pla:pattern)`** | Positive look ahead. | <span style="color:green">Supported</span> | |
| 141 | +| **`(*positive_lookahead:pattern)`** | Positive look ahead. | <span style="color:green">Supported</span> | |
| 142 | +| **`(?!pattern)`** | Negative look ahead. | <span style="color:green">Supported</span> | |
| 143 | +| **`(*nla:pattern)`** | Negative look ahead. | <span style="color:green">Supported</span> | |
| 144 | +| **`(*negative_lookahead:pattern)`** | Negative look ahead. | <span style="color:green">Supported</span> | |
| 145 | +| **`(?<=pattern)`** | Positive look behind. | <span style="color:gray">Planned</span> | |
| 146 | +| **`(*plb:pattern)`** | Positive look behind. | <span style="color:gray">Planned</span> | |
| 147 | +| **`(*positive_lookbehind:pattern)`** | Positive look behind. | <span style="color:gray">Planned</span> | |
| 148 | +| **`(?<!pattern)`** | Negative look behind. | <span style="color:gray">Planned</span> | |
| 149 | +| **`(*nlb:pattern)`** | Negative look behind. | <span style="color:gray">Planned</span> | |
| 150 | +| **`(*negative_lookbehind:pattern)`** | Negative look behind. | <span style="color:gray">Planned</span> | |
| 151 | + |
| 152 | + |
| 153 | +### Special Backtracking Control Verbs |
| 154 | + |
| 155 | +| Backtracking control verb | Notes | Status | |
| 156 | +| --- | --- | --- | |
| 157 | +| **`(*SKIP) (*SKIP:NAME)`** | Start next search here. | <span style="color:gray">Planned</span> | |
| 158 | +| **`(*PRUNE) (*PRUNE:NAME)`** | No backtracking over this point. | <span style="color:gray">Planned</span> | |
| 159 | +| **`(*MARK:NAME) (*:NAME)`** | Place a named mark. | <span style="color:gray">Planned</span> | |
| 160 | +| **`(*THEN) (*THEN:NAME)`** | Like PRUNE. | <span style="color:gray">Planned</span> | |
| 161 | +| **`(*COMMIT) (*COMMIT:arg)`** | Stop searching. | <span style="color:gray">Planned</span> | |
| 162 | +| **`(*FAIL) (*F) (*FAIL:arg)`** | Fail the pattern/branch. | <span style="color:gray">Planned</span> | |
| 163 | +| **`(*ACCEPT) (*ACCEPT:arg)`** | Accept the pattern/subpattern. | <span style="color:gray">Planned</span> | |
| 164 | + |
| 165 | + |
| 166 | +## Not planned (Mainly because of Unicode or perl specifics) |
| 167 | + |
| 168 | +### Modifiers |
| 169 | + |
| 170 | +| Modifier | Notes | Status | |
| 171 | +| --- | --- | --- | |
| 172 | +| `p` | Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} are available for use after matching. | <span style="color:darkred">Not planned</span> | |
| 173 | +| `a`, `d`, `l`, and `u` | These modifiers affect which character-set rules (Unicode, etc.) are used, as described below in "Character set modifiers". | <span style="color:darkred">Not planned</span> | |
| 174 | +| `g` | globally match the pattern repeatedly in the string | <span style="color:darkred">Not planned</span> | |
| 175 | +| `e` | evaluate the right-hand side as an expression | <span style="color:darkred">Not planned</span> | |
| 176 | +| `ee` | evaluate the right side as a string then eval the result | <span style="color:darkred">Not planned</span> | |
| 177 | +| `o` | pretend to optimize your code, but actually introduce bugs | <span style="color:darkred">Not planned</span> | |
| 178 | +| `r` | perform non-destructive substitution and return the new value | <span style="color:darkred">Not planned</span> | |
| 179 | + |
| 180 | + |
| 181 | +### Escape sequences |
| 182 | + |
| 183 | +| Escape sequence | Notes | Status | |
| 184 | +| --- | --- | --- | |
| 185 | +| `\cK` | control char (example: VT) | <span style="color:darkred">Not planned</span> | |
| 186 | +| `\N{name}` | named Unicode character or character sequence | <span style="color:darkred">Not planned</span> | |
| 187 | +| `\N{U+263D}` | Unicode character (example: FIRST QUARTER MOON) | <span style="color:darkred">Not planned</span> | |
| 188 | +| `\l` | lowercase next char (think vi) | <span style="color:darkred">Not planned</span> | |
| 189 | +| `\u` | uppercase next char (think vi) | <span style="color:darkred">Not planned</span> | |
| 190 | +| `\L` | lowercase until \E (think vi) | <span style="color:darkred">Not planned</span> | |
| 191 | +| `\U` | uppercase until \E (think vi) | <span style="color:darkred">Not planned</span> | |
| 192 | +| `\Q` | quote (disable) pattern metacharacters until \E | <span style="color:darkred">Not planned</span> | |
| 193 | +| `\E` | end either case modification or quoted section, think vi | <span style="color:darkred">Not planned</span> | |
| 194 | + |
| 195 | + |
| 196 | +### Character Classes and other Special Escapes |
| 197 | + |
| 198 | +| Character class or escape | Notes | Status | |
| 199 | +| --- | --- | --- | |
| 200 | +| `(?[...])` | Extended bracketed character class | <span style="color:darkred">Not planned</span> | |
| 201 | +| `\pP` | Match P, named property. Use \p{Prop} for longer names | <span style="color:darkred">Not planned</span> | |
| 202 | +| `\PP` | Match non-P | <span style="color:darkred">Not planned</span> | |
| 203 | +| `\X` | Match Unicode "eXtended grapheme cluster" | <span style="color:darkred">Not planned</span> | |
| 204 | +| `\R` | Linebreak | <span style="color:darkred">Not planned</span> | |
| 205 | + |
| 206 | + |
| 207 | +### Assertions |
| 208 | + |
| 209 | +| Assertion | Notes | Status | |
| 210 | +| --- | --- | --- | |
| 211 | +| `\b{}` | Match at Unicode boundary of specified type | <span style="color:darkred">Not planned</span> | |
| 212 | +| `\B{}` | Match where corresponding \b{} doesn't match | <span style="color:darkred">Not planned</span> | |
| 213 | + |
| 214 | +### Extended Patterns |
| 215 | + |
| 216 | + |
| 217 | +| Extended pattern | Notes | Status | |
| 218 | +| --- | --- | --- | |
| 219 | +| `(?{ code })` | Perl code execution. | <span style="color:darkred">Not planned</span> | |
| 220 | +| `(*{ code })` | Perl code execution. | <span style="color:darkred">Not planned</span> | |
| 221 | +| `(??{ code })` | Perl code execution. | <span style="color:darkred">Not planned</span> | |
| 222 | +| `(?PARNO)` `(?-PARNO)` `(?+PARNO)` `(?R)` `(?0)` | Recursive subpattern. | <span style="color:darkred">Not planned</span> | |
| 223 | +| `(?&NAME)` | Recursive subpattern. | <span style="color:darkred">Not planned</span> | |
| 224 | + |
| 225 | + |
| 226 | +### Script runs |
| 227 | + |
| 228 | +| Script runs | Notes | Status | |
| 229 | +| --- | --- | --- | |
| 230 | +| `(*script_run:pattern)` | All chars in pattern need to be of the same script. | <span style="color:darkred">Not planned</span> | |
| 231 | +| `(*sr:pattern)` | All chars in pattern need to be of the same script. | <span style="color:darkred">Not planned</span> | |
| 232 | +| `(*atomic_script_run:pattern)` | Without backtracking. | <span style="color:darkred">Not planned</span> | |
| 233 | +| `(*asr:pattern)` | Without backtracking. | <span style="color:darkred">Not planned</span> | |
0 commit comments