Skip to content

Commit ede7f9d

Browse files
committed
pod and comments: Note escape vs quote
Fixes #15221 The documentation and comments was misleading about conflating quoting a metacharacter and escaping it. Since \Q stands for quote, we have to continue to use that terminology. This commit clarifies that the two terms are often equivalent.
1 parent 9aead5b commit ede7f9d

File tree

7 files changed

+31
-26
lines changed

7 files changed

+31
-26
lines changed

pod/perldiag.pod

+4-4
Original file line numberDiff line numberDiff line change
@@ -2602,8 +2602,8 @@ and perl's F</dev/null> emulation was unable to create an empty temporary file.
26022602
(W regexp)(F) A character class range must start and end at a literal
26032603
character, not another character class like C<\d> or C<[:alpha:]>. The "-"
26042604
in your false range is interpreted as a literal "-". In a C<(?[...])>
2605-
construct, this is an error, rather than a warning. Consider quoting
2606-
the "-", "\-". The S<<-- HERE> shows whereabouts in the regular expression
2605+
construct, this is an error, rather than a warning. Consider escaping
2606+
the "-" as "\-". The S<<-- HERE> shows whereabouts in the regular expression
26072607
the problem was discovered. See L<perlre>.
26082608

26092609
=item Fatal VMS error (status=%d) at %s, line %d
@@ -5453,7 +5453,7 @@ S<<-- HERE> in m/%s/
54535453
(F) Within regular expression character classes ([]) the syntax beginning
54545454
with "[." and ending with ".]" is reserved for future extensions. If you
54555455
need to represent those character sequences inside a regular expression
5456-
character class, just quote the square brackets with the backslash: "\[."
5456+
character class, just escape the square brackets with the backslash: "\[."
54575457
and ".\]". The S<<-- HERE> shows whereabouts in the regular expression the
54585458
problem was discovered. See L<perlre>.
54595459

@@ -5463,7 +5463,7 @@ S<<-- HERE> in m/%s/
54635463
(F) Within regular expression character classes ([]) the syntax beginning
54645464
with "[=" and ending with "=]" is reserved for future extensions. If you
54655465
need to represent those character sequences inside a regular expression
5466-
character class, just quote the square brackets with the backslash: "\[="
5466+
character class, just escape the square brackets with the backslash: "\[="
54675467
and "=\]". The S<<-- HERE> shows whereabouts in the regular expression the
54685468
problem was discovered. See L<perlre>.
54695469

pod/perlfunc.pod

+5
Original file line numberDiff line numberDiff line change
@@ -6536,6 +6536,11 @@ the C<\Q> escape in double-quoted strings.
65366536

65376537
If EXPR is omitted, uses L<C<$_>|perlvar/$_>.
65386538

6539+
The motivation behind this is to make all characters in EXPR have their
6540+
literal meanings. Otherwise any metacharacters in it could trigger
6541+
their "magic" actions. The characters are said to be "quoted" or
6542+
"escaped".
6543+
65396544
quotemeta (and C<\Q> ... C<\E>) are useful when interpolating strings into
65406545
regular expressions, because by default an interpolated variable will be
65416546
considered a mini-regular expression. For example:

pod/perlre.pod

+11-11
Original file line numberDiff line numberDiff line change
@@ -1348,17 +1348,17 @@ their punctuation character equivalents, however at the trade-off that you
13481348
have to tell perl when you want to use them.
13491349
X</p> X<p modifier>
13501350

1351-
=head2 Quoting metacharacters
1352-
1353-
Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
1354-
C<\w>, C<\n>. Unlike some other regular expression languages, there
1355-
are no backslashed symbols that aren't alphanumeric. So anything
1356-
that looks like C<\\>, C<\(>, C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> is
1357-
always
1358-
interpreted as a literal character, not a metacharacter. This was
1359-
once used in a common idiom to disable or quote the special meanings
1360-
of regular expression metacharacters in a string that you want to
1361-
use for a pattern. Simply quote all non-"word" characters:
1351+
=head2 Quoting (or escaping) metacharacters
1352+
1353+
Escape sequences in Perl consist of a backslash followed by an
1354+
alphanumeric character, such as C<\b>, C<\w>, C<\n>. Unlike some other
1355+
regular expression languages, any sequence consisting of a backslash
1356+
followed by a non-alphanumeric means that non-alphanumeric, literally.
1357+
So things like C<\\>, C<\(>, C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> are
1358+
always interpreted as the literal character that follows the backslash.
1359+
This was used to be commonly used to disable (or quote) the special
1360+
meanings of regular expression metacharacters in a string that you want
1361+
to use for a pattern. Simply quote all non-"word" characters:
13621362

13631363
$pattern =~ s/(\W)/\\$1/g;
13641364

pod/perlrebackslash.pod

+7-7
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,8 @@ as C<Not in [].>
9090
\o{} Octal escape sequence.
9191
\p{}, \pP Match any character with the given Unicode property.
9292
\P{}, \PP Match any character without the given property.
93-
\Q Quote (disable) pattern metacharacters till \E. Not
94-
in [].
93+
\Q Quote (disable) pattern metacharacters till \E.
94+
(Also called "escape".) Not in [].
9595
\r Return character.
9696
\R Generic new line. Not in [].
9797
\s Match any whitespace character.
@@ -350,11 +350,11 @@ them, until either the end of the pattern or the next occurrence of
350350
C<\E>, whichever comes first. They provide functionality similar to what
351351
the functions C<lc> and C<uc> provide.
352352

353-
C<\Q> is used to quote (disable) pattern metacharacters, up to the next
354-
C<\E> or the end of the pattern. C<\Q> adds a backslash to any character
355-
that could have special meaning to Perl. In the ASCII range, it quotes
356-
every character that isn't a letter, digit, or underscore. See
357-
L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
353+
C<\Q> is used to quote or escape (disable) pattern metacharacters, up to
354+
the next C<\E> or the end of the pattern. C<\Q> adds a backslash to any
355+
character that could have special meaning to Perl. In the ASCII range,
356+
it quotes every character that isn't a letter, digit, or underscore.
357+
See L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
358358
code points. Using this ensures that any character between C<\Q> and
359359
C<\E> will be matched literally, not interpreted as a metacharacter by
360360
the regex engine.

pod/perlreref.pod

+1-1
Original file line numberDiff line numberDiff line change
@@ -318,7 +318,7 @@ Captured groups are numbered according to their I<opening> paren.
318318
fc Foldcase a string
319319

320320
pos Return or set current match position
321-
quotemeta Quote metacharacters
321+
quotemeta Quote metacharacters (escape their normal meaning)
322322
reset Reset m?pattern? status
323323
study Analyze string for optimizing matching
324324

pod/perlretut.pod

+1-1
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ C<"["> respectively; other gotchas apply.
187187
The significance of each of these will be explained
188188
in the rest of the tutorial, but for now, it is important only to know
189189
that a metacharacter can be matched as-is by putting a backslash before
190-
it:
190+
it. This is called "escaping" or "quoting" it. Some examples:
191191

192192
"2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
193193
"2+2=4" =~ /2\+2/; # matches, \+ is treated like an ordinary +

pp.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -5082,7 +5082,7 @@ PP(pp_quotemeta)
50825082
else if (UTF8_IS_NEXT_CHAR_DOWNGRADEABLE(s, s + len)) {
50835083
if (
50845084
#ifdef USE_LOCALE_CTYPE
5085-
/* In locale, we quote all non-ASCII Latin1 chars.
5085+
/* In locale, we escape all non-ASCII Latin1 chars.
50865086
* Otherwise use the quoting rules */
50875087

50885088
IN_LC_RUNTIME(LC_CTYPE)
@@ -5116,7 +5116,7 @@ PP(pp_quotemeta)
51165116
}
51175117
}
51185118
else {
5119-
/* For non UNI_8_BIT (and hence in locale) just quote all \W
5119+
/* For non UNI_8_BIT (and hence in locale) just escape all \W
51205120
* including everything above ASCII */
51215121
while (len--) {
51225122
if (!isWORDCHAR_A(*s))

0 commit comments

Comments
 (0)