Skip to content

Commit 62cda40

Browse files
committed
PHP 8.3 | Tokenizer/PHP: support "yield from" with comments
As discussed in and discovered via issue 529: * Prior to PHP 8.3, only whitespace was allowed between the `yield` and `from` keywords. A comment between the `yield` and `from` keywords in a `yield from` expression would result in a parse error. * As of PHP 8.3, this is no longer a parse error and both whitespace as well as comments are allowed between the `yield` and `from` and the complete expression is tokenized in PHP itself as one `T_YIELD_FROM` token. See: https://3v4l.org/2SI2Q#veol In the context of PHPCS this is problematic as comments should always have their own token to allow sniffs to examine them. Additionally, such comments may contain PHPCS ignore annotations, which, when not tokenized as a separate token, will not be respected. This commit adds support for this change in PHP 8.3 to PHP_CodeSniffer. It does contain an, albeit small, BC-break, due to the BC-break created by PHP. Previously in PHPCS: * A single line `yield from` expression would always tokenize as `T_YIELD_FROM`, independently of the type and amount of whitespace between the keywords. * A multi-line `yield` [new line]+ `from` expression would tokenize as multiple `T_YIELD_FROM` tokens, one for each line. * A `yield from` expression with a comment between the keywords was not supported. In PHP < 8.3, this meant that this would tokenize as `T_YIELD`, [`T_WHITESPACE`|T_COMMENT`]+, `T_STRING` (`from`). As of PHP 8.3, this was tokenized as one or more `T_YIELD_FROM` tokens (depending on single/multi-line) with the comment being tokenized as `T_YIELD_FROM` as well. This commit changes this as follows: * Single line `yield from` expression with only whitespace between the keywords: **no change**, this will still tokenize as a single `T_YIELD_FROM` token. * Multi-line `yield` [new line]+ `from` expressions and `yield from` expressions with a comment (both single line as well as multi-line) will now consistently be tokenized as `T_YIELD_FROM` (`yield`), [`T_WHITESPACE`|T_COMMENT`]+, `T_YIELD_FROM` (`from`). In practice, this means that: * Whitespace and comments between the keywords can now be examined and handled by relevant sniffs, which are likely to give more accurate results (fewer false negatives, like for tab indentation of a `from` keyword on a separate line). * The tokenization used by PHPCS is now consistent again for all supported PHP versions. * The PHP 8.3 change is now supported. It does mean that sniffs which explicitly handle multi-token `yield from` expressions, will need to be updated. In my opinion, adding this change in a minor is justified as: 1. The PHP 8.3 change can not be supported otherwise. 2. The impact is expected to be minimal anyhow as there are not many sniffs which specifically look for and handle `T_YIELD_FROM` tokens and those sniffs within PHPCS itself will be updated/adjusted in the same release. Also, the (negative) impact on _end-users_ of this BC-break is also expected to be minimal as a scan of the top 2000 projects listed on Packagist shows that in those project no multi-line/multi-token `yield from` expressions are used in the source code, which means that even when sniff code is not updated (yet) for the change in tokenization, the chances of an end-user getting incorrect results because of this are very slim as the code affected is just not written as multi-line/with comment that often. Includes tests. Fixes 529 Refs: * squizlabs/PHP_CodeSniffer 1524 (original polyfill code) * php/php-src 10125 * php/php-src 14926 * https://externals.io/message/124462 --- Information for standards maintainers The "yield from" _keyword_ could previously already consist of multiple T_YIELD_FROM tokens if the "keyword" was spread over multiple lines. Now, the tokens between the actual keywords will be tokenized as `T_WHITESPACE` and comment tokens. To find the last token for a `T_YIELD_FROM` "keyword", change old code like this: ```php $yieldFromEnd = $stackPtr; if (preg_match('`yield\s+from`', $tokens[$stackPtr]['content']) !== 1) { for ($yieldFromEnd = ($stackPtr + 1); $tokens[$yieldFromEnd]['code'] === T_YIELD_FROM; $yieldFromEnd++); --$yieldFromEnd; } ``` to ```php $yieldFromEnd = $stackPtr; if (strtolower(trim($tokens[$stackPtr]['content'])) === 'yield') { for ($i = ($stackPtr + 1); $i < $phpcsFile->numTokens; $i++) { if ($tokens[$i]['code'] === T_YIELD_FROM && strtolower(trim($tokens[$i]['content'])) === 'from') { $yieldFromEnd = $i; break; } if (isset(Tokens::$emptyTokens[$tokens[$i]['code']]) === false && $tokens[$i]['code'] !== T_YIELD_FROM) { // Shouldn't be possible. Just to be on the safe side. break; } } } ``` The above presumes that `$stackPtr` is set to a `T_YIELD_FROM` token. Also note that the second code snippet is largely cross-version compatible. It will work with older PHPCS versions with code compatible with PHP < 8.3 and will work on PHPCS 3.11.0+ for code compatible with all supported PHP versions.
1 parent db1ebe1 commit 62cda40

File tree

3 files changed

+348
-10
lines changed

3 files changed

+348
-10
lines changed

src/Tokenizers/PHP.php

Lines changed: 133 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1514,7 +1514,7 @@ protected function tokenize($string)
15141514
}//end if
15151515

15161516
/*
1517-
Before PHP 7.0, the "yield from" was tokenized as
1517+
Before PHP 7.0, "yield from" was tokenized as
15181518
T_YIELD, T_WHITESPACE and T_STRING. So look for
15191519
and change this token in earlier versions.
15201520
*/
@@ -1525,12 +1525,16 @@ protected function tokenize($string)
15251525
&& isset($tokens[($stackPtr + 1)]) === true
15261526
&& isset($tokens[($stackPtr + 2)]) === true
15271527
&& $tokens[($stackPtr + 1)][0] === T_WHITESPACE
1528+
&& strpos($tokens[($stackPtr + 1)][1], $this->eolChar) === false
15281529
&& $tokens[($stackPtr + 2)][0] === T_STRING
15291530
&& strtolower($tokens[($stackPtr + 2)][1]) === 'from'
15301531
) {
1531-
// Could be multi-line, so adjust the token stack.
1532-
$token[0] = T_YIELD_FROM;
1533-
$token[1] .= $tokens[($stackPtr + 1)][1].$tokens[($stackPtr + 2)][1];
1532+
// Single-line "yield from" with only whitespace between.
1533+
$finalTokens[$newStackPtr] = [
1534+
'code' => T_YIELD_FROM,
1535+
'type' => 'T_YIELD_FROM',
1536+
'content' => $token[1].$tokens[($stackPtr + 1)][1].$tokens[($stackPtr + 2)][1],
1537+
];
15341538

15351539
if (PHP_CODESNIFFER_VERBOSITY > 1) {
15361540
for ($i = ($stackPtr + 1); $i <= ($stackPtr + 2); $i++) {
@@ -1540,9 +1544,131 @@ protected function tokenize($string)
15401544
}
15411545
}
15421546

1543-
$tokens[($stackPtr + 1)] = null;
1544-
$tokens[($stackPtr + 2)] = null;
1545-
}
1547+
$newStackPtr++;
1548+
$stackPtr += 2;
1549+
1550+
continue;
1551+
} else if (PHP_VERSION_ID < 80300
1552+
&& $tokenIsArray === true
1553+
&& $token[0] === T_STRING
1554+
&& strtolower($token[1]) === 'from'
1555+
&& $finalTokens[$lastNotEmptyToken]['code'] === T_YIELD
1556+
) {
1557+
/*
1558+
Before PHP 8.3, if there was a comment between the "yield" and "from" keywords,
1559+
it was tokenized as T_YIELD, T_WHITESPACE, T_COMMENT... and T_STRING.
1560+
We want to keep the tokenization of the tokens between, but need to change the
1561+
`T_YIELD` and `T_STRING` (from) keywords to `T_YIELD_FROM.
1562+
*/
1563+
1564+
$finalTokens[$lastNotEmptyToken]['code'] = T_YIELD_FROM;
1565+
$finalTokens[$lastNotEmptyToken]['type'] = 'T_YIELD_FROM';
1566+
1567+
$finalTokens[$newStackPtr] = [
1568+
'code' => T_YIELD_FROM,
1569+
'type' => 'T_YIELD_FROM',
1570+
'content' => $token[1],
1571+
];
1572+
$newStackPtr++;
1573+
1574+
if (PHP_CODESNIFFER_VERBOSITY > 1) {
1575+
echo "\t\t* token $lastNotEmptyToken (new stack) changed into T_YIELD_FROM; was: T_YIELD".PHP_EOL;
1576+
echo "\t\t* token $stackPtr changed into T_YIELD_FROM; was: T_STRING".PHP_EOL;
1577+
}
1578+
1579+
continue;
1580+
} else if (PHP_VERSION_ID >= 70000
1581+
&& $tokenIsArray === true
1582+
&& $token[0] === T_YIELD_FROM
1583+
&& strpos($token[1], $this->eolChar) !== false
1584+
&& preg_match('`^yield\s+from$`i', $token[1]) === 1
1585+
) {
1586+
/*
1587+
In PHP 7.0+, a multi-line "yield from" (without comment) tokenizes as a single
1588+
T_YIELD_FROM token, but we want to split it and tokenize the whitespace
1589+
separately for consistency.
1590+
*/
1591+
1592+
$finalTokens[$newStackPtr] = [
1593+
'code' => T_YIELD_FROM,
1594+
'type' => 'T_YIELD_FROM',
1595+
'content' => substr($token[1], 0, 5),
1596+
];
1597+
$newStackPtr++;
1598+
1599+
$tokenLines = explode($this->eolChar, substr($token[1], 5, -4));
1600+
$numLines = count($tokenLines);
1601+
$newToken = [
1602+
'type' => 'T_WHITESPACE',
1603+
'code' => T_WHITESPACE,
1604+
'content' => '',
1605+
];
1606+
1607+
foreach ($tokenLines as $i => $line) {
1608+
$newToken['content'] = $line;
1609+
if ($i === ($numLines - 1)) {
1610+
if ($line === '') {
1611+
break;
1612+
}
1613+
} else {
1614+
$newToken['content'] .= $this->eolChar;
1615+
}
1616+
1617+
$finalTokens[$newStackPtr] = $newToken;
1618+
$newStackPtr++;
1619+
}
1620+
1621+
$finalTokens[$newStackPtr] = [
1622+
'code' => T_YIELD_FROM,
1623+
'type' => 'T_YIELD_FROM',
1624+
'content' => substr($token[1], -4),
1625+
];
1626+
$newStackPtr++;
1627+
1628+
if (PHP_CODESNIFFER_VERBOSITY > 1) {
1629+
echo "\t\t* token $stackPtr split into 'yield', one or more whitespace tokens and 'from'".PHP_EOL;
1630+
}
1631+
1632+
continue;
1633+
} else if (PHP_VERSION_ID >= 80300
1634+
&& $tokenIsArray === true
1635+
&& $token[0] === T_YIELD_FROM
1636+
&& preg_match('`^yield[ \t]+from$`i', $token[1]) !== 1
1637+
&& stripos($token[1], 'yield') === 0
1638+
) {
1639+
/*
1640+
Since PHP 8.3, "yield from" allows for comments and will
1641+
swallow the comment in the `T_YIELD_FROM` token.
1642+
We need to split this up to allow for sniffs handling comments.
1643+
*/
1644+
1645+
$finalTokens[$newStackPtr] = [
1646+
'code' => T_YIELD_FROM,
1647+
'type' => 'T_YIELD_FROM',
1648+
'content' => substr($token[1], 0, 5),
1649+
];
1650+
$newStackPtr++;
1651+
1652+
$yieldFromSubtokens = @token_get_all("<?php\n".substr($token[1], 5, -4));
1653+
// Remove the PHP open tag token.
1654+
array_shift($yieldFromSubtokens);
1655+
// Add the "from" keyword.
1656+
$yieldFromSubtokens[] = [
1657+
0 => T_YIELD_FROM,
1658+
1 => substr($token[1], -4),
1659+
];
1660+
1661+
// Inject the new tokens into the token stack.
1662+
array_splice($tokens, ($stackPtr + 1), 0, $yieldFromSubtokens);
1663+
$numTokens = count($tokens);
1664+
1665+
if (PHP_CODESNIFFER_VERBOSITY > 1) {
1666+
echo "\t\t* token $stackPtr split into parts (yield from with comment)".PHP_EOL;
1667+
}
1668+
1669+
unset($yieldFromSubtokens);
1670+
continue;
1671+
}//end if
15461672

15471673
/*
15481674
Before PHP 5.6, the ... operator was tokenized as three

tests/Core/Tokenizers/PHP/YieldTest.inc

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,31 @@ function generator()
2222

2323
FROM
2424
gen2();
25+
26+
/* testYieldFromSplitByComment */
27+
yield /* comment */ from gen2();
28+
29+
/* testYieldFromWithTrailingComment */
30+
yield // comment
31+
from gen2();
32+
33+
/* testYieldFromWithTrailingAnnotation */
34+
yield // phpcs:ignore Stnd.Cat.Sniff -- for reasons.
35+
from gen2();
36+
37+
/* testYieldFromSplitByNewLineAndComments */
38+
yield
39+
/* comment line 1
40+
line 2 */
41+
// another comment
42+
from
43+
gen2();
44+
45+
/* testYieldFromSplitByNewLineAndAnnotation */
46+
YIELD
47+
// @phpcs:disable Stnd.Cat.Sniff -- for reasons.
48+
From
49+
gen2();
2550
}
2651

2752
/* testYieldUsedAsClassName */

0 commit comments

Comments
 (0)