You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PHP 8.3 | Tokenizer/PHP: support "yield from" with comments
As discussed in and discovered via issue 529:
* Prior to PHP 8.3, only whitespace was allowed between the `yield` and `from` keywords. A comment between the `yield` and `from` keywords in a `yield from` expression would result in a parse error.
* As of PHP 8.3, this is no longer a parse error and both whitespace as well as comments are allowed between the `yield` and `from` and the complete expression is tokenized in PHP itself as one `T_YIELD_FROM` token. See: https://3v4l.org/2SI2Q#veol
In the context of PHPCS this is problematic as comments should always have their own token to allow sniffs to examine them.
Additionally, such comments may contain PHPCS ignore annotations, which, when not tokenized as a separate token, will not be respected.
This commit adds support for this change in PHP 8.3 to PHP_CodeSniffer. It does contain an, albeit small, BC-break, due to the BC-break created by PHP.
Previously in PHPCS:
* A single line `yield from` expression would always tokenize as `T_YIELD_FROM`, independently of the type and amount of whitespace between the keywords.
* A multi-line `yield` [new line]+ `from` expression would tokenize as multiple `T_YIELD_FROM` tokens, one for each line.
* A `yield from` expression with a comment between the keywords was not supported.
In PHP < 8.3, this meant that this would tokenize as `T_YIELD`, [`T_WHITESPACE`|T_COMMENT`]+, `T_STRING` (`from`).
As of PHP 8.3, this was tokenized as one or more `T_YIELD_FROM` tokens (depending on single/multi-line) with the comment being tokenized as `T_YIELD_FROM` as well.
This commit changes this as follows:
* Single line `yield from` expression with only whitespace between the keywords: **no change**, this will still tokenize as a single `T_YIELD_FROM` token.
* Multi-line `yield` [new line]+ `from` expressions and `yield from` expressions with a comment (both single line as well as multi-line) will now consistently be tokenized as `T_YIELD_FROM` (`yield`), [`T_WHITESPACE`|T_COMMENT`]+, `T_YIELD_FROM` (`from`).
In practice, this means that:
* Whitespace and comments between the keywords can now be examined and handled by relevant sniffs, which are likely to give more accurate results (fewer false negatives, like for tab indentation of a `from` keyword on a separate line).
* The tokenization used by PHPCS is now consistent again for all supported PHP versions.
* The PHP 8.3 change is now supported.
It does mean that sniffs which explicitly handle multi-token `yield from` expressions, will need to be updated.
In my opinion, adding this change in a minor is justified as:
1. The PHP 8.3 change can not be supported otherwise.
2. The impact is expected to be minimal anyhow as there are not many sniffs which specifically look for and handle `T_YIELD_FROM` tokens and those sniffs within PHPCS itself will be updated/adjusted in the same release.
Also, the (negative) impact on _end-users_ of this BC-break is also expected to be minimal as a scan of the top 2000 projects listed on Packagist shows that in those project no multi-line/multi-token `yield from` expressions are used in the source code, which means that even when sniff code is not updated (yet) for the change in tokenization, the chances of an end-user getting incorrect results because of this are very slim as the code affected is just not written as multi-line/with comment that often.
Includes tests.
Fixes 529
Refs:
* squizlabs/PHP_CodeSniffer 1524 (original polyfill code)
* php/php-src 10125
* php/php-src 14926
* https://externals.io/message/124462
---
Information for standards maintainers
The "yield from" _keyword_ could previously already consist of multiple T_YIELD_FROM tokens if the "keyword" was spread over multiple lines.
Now, the tokens between the actual keywords will be tokenized as `T_WHITESPACE` and comment tokens.
To find the last token for a `T_YIELD_FROM` "keyword", change old code like this:
```php
$yieldFromEnd = $stackPtr;
if (preg_match('`yield\s+from`', $tokens[$stackPtr]['content']) !== 1) {
for ($yieldFromEnd = ($stackPtr + 1); $tokens[$yieldFromEnd]['code'] === T_YIELD_FROM; $yieldFromEnd++);
--$yieldFromEnd;
}
```
to
```php
$yieldFromEnd = $stackPtr;
if (strtolower(trim($tokens[$stackPtr]['content'])) === 'yield') {
for ($i = ($stackPtr + 1); $i < $phpcsFile->numTokens; $i++) {
if ($tokens[$i]['code'] === T_YIELD_FROM && strtolower(trim($tokens[$i]['content'])) === 'from') {
$yieldFromEnd = $i;
break;
}
if (isset(Tokens::$emptyTokens[$tokens[$i]['code']]) === false && $tokens[$i]['code'] !== T_YIELD_FROM) {
// Shouldn't be possible. Just to be on the safe side.
break;
}
}
}
```
The above presumes that `$stackPtr` is set to a `T_YIELD_FROM` token.
Also note that the second code snippet is largely cross-version compatible. It will work with older PHPCS versions with code compatible with PHP < 8.3 and will work on PHPCS 3.11.0+ for code compatible with all supported PHP versions.
0 commit comments