-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Add new nibbles(bytes)
, clz(bytes)
and clz(uint256)
functions
#5725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
42c79f1
5754ab8
4383e01
5a44b11
d6db2d7
9b8af8f
2cf0acf
8116913
c5426dd
1dfee41
ab00bb1
e6092ae
ede80e9
d65d7a5
fb91c85
c749346
12c87dd
740a056
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
'openzeppelin-solidity': minor | ||
--- | ||
|
||
`Bytes`: Add a `nibbles` function to split each byte into two nibbles. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
'openzeppelin-solidity': minor | ||
--- | ||
|
||
`Bytes`: Add an `equal` function to compare byte buffers. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
'openzeppelin-solidity': minor | ||
--- | ||
|
||
`Bytes`: Add a `clz` function to count the leading zero bytes in a `uint256` value. |
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -99,6 +99,35 @@ library Bytes { | |||||||||||||||||||||||||||
return result; | ||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
/// @dev Split each byte in `value` into two nibbles (4 bits each). | ||||||||||||||||||||||||||||
function nibbles(bytes memory value) internal pure returns (bytes memory) { | ||||||||||||||||||||||||||||
uint256 length = value.length; | ||||||||||||||||||||||||||||
bytes memory nibbles_ = new bytes(length * 2); | ||||||||||||||||||||||||||||
for (uint256 i = 0; i < length; i++) { | ||||||||||||||||||||||||||||
(nibbles_[i * 2], nibbles_[i * 2 + 1]) = (value[i] & 0xf0, value[i] & 0x0f); | ||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||
return nibbles_; | ||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||||
* @dev Returns true if the two byte buffers are equal. | ||||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||||
function equal(bytes memory a, bytes memory b) internal pure returns (bool) { | ||||||||||||||||||||||||||||
return a.length == b.length && keccak256(a) == keccak256(b); | ||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
/// @dev Counts the number of leading zero bytes in a uint256. | ||||||||||||||||||||||||||||
function clz(uint256 x) internal pure returns (uint256) { | ||||||||||||||||||||||||||||
if (x == 0) return 32; // All 32 bytes are zero | ||||||||||||||||||||||||||||
uint256 r = 0; | ||||||||||||||||||||||||||||
if (x > 0xffffffffffffffffffffffffffffffff) r = 128; // Upper 128 bits | ||||||||||||||||||||||||||||
if ((x >> r) > 0xffffffffffffffff) r |= 64; // Next 64 bits | ||||||||||||||||||||||||||||
if ((x >> r) > 0xffffffff) r |= 32; // Next 32 bits | ||||||||||||||||||||||||||||
if ((x >> r) > 0xffff) r |= 16; // Next 16 bits | ||||||||||||||||||||||||||||
if ((x >> r) > 0xff) r |= 8; // Next 8 bits | ||||||||||||||||||||||||||||
return 31 ^ (r >> 3); // Convert to leading zero bytes count | ||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could reuse the
Suggested change
Also, echoing on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated. Agree that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO we should distinguish between I would argue that clz is one of these "local" math property and is closer to the Math library then it is to things like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If CLZ operates on We could have a variant of CLZ in Bytes that does something like function clz(bytes memory buffer) internal pure returns (uint256) {
uint256 i = 0;
while (i < buffer.length && buffer[i] != bytes1(0)) ++i;
return 8 * i + (i == buffer.length ? 0 : Math.clz(buffer[i]));
} (should be optimized for reading in blocks of 32 instead of byte by byte |
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||||
* @dev Reads a bytes32 from a bytes array without bounds checking. | ||||||||||||||||||||||||||||
* | ||||||||||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
pragma solidity ^0.8.20; | ||
|
||
import {Test} from "forge-std/Test.sol"; | ||
import {Math} from "@openzeppelin/contracts/utils/math/Math.sol"; | ||
import {Bytes} from "@openzeppelin/contracts/utils/Bytes.sol"; | ||
|
||
contract BytesTest is Test { | ||
function testIndexOf(bytes memory buffer, bytes1 s) public pure { | ||
testIndexOf(buffer, s, 0); | ||
} | ||
|
||
function testIndexOf(bytes memory buffer, bytes1 s, uint256 pos) public pure { | ||
uint256 result = Bytes.indexOf(buffer, s, pos); | ||
|
||
// Should not be found before result | ||
for (uint256 i = pos; result != type(uint256).max && i < result; i++) assertNotEq(buffer[i], s); | ||
if (result != type(uint256).max) assertEq(buffer[result], s); | ||
} | ||
|
||
function testLastIndexOf(bytes memory buffer, bytes1 s) public pure { | ||
testLastIndexOf(buffer, s, 0); | ||
} | ||
|
||
function testLastIndexOf(bytes memory buffer, bytes1 s, uint256 pos) public pure { | ||
pos = bound(pos, 0, buffer.length); | ||
uint256 result = Bytes.lastIndexOf(buffer, s, pos); | ||
|
||
// Should not be found before result | ||
for (uint256 i = pos; result != type(uint256).max && i < result; i++) assertNotEq(buffer[i], s); | ||
Amxx marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (result != type(uint256).max) assertEq(buffer[result], s); | ||
} | ||
|
||
function testSlice(bytes memory buffer, uint256 start) public pure { | ||
testSlice(buffer, start, buffer.length); | ||
} | ||
|
||
function testSlice(bytes memory buffer, uint256 start, uint256 end) public pure { | ||
bytes memory result = Bytes.slice(buffer, start, end); | ||
uint256 sanitizedEnd = Math.min(end, buffer.length); | ||
uint256 sanitizedStart = Math.min(start, sanitizedEnd); | ||
assertEq(result.length, sanitizedEnd - sanitizedStart); | ||
for (uint256 i = 0; i < result.length; i++) assertEq(result[i], buffer[sanitizedStart + i]); | ||
} | ||
|
||
function testNibbles(bytes memory value) public pure { | ||
bytes memory result = Bytes.nibbles(value); | ||
assertEq(result.length, value.length * 2); | ||
for (uint256 i = 0; i < value.length; i++) { | ||
bytes1 originalByte = value[i]; | ||
bytes1 highNibble = result[i * 2]; | ||
bytes1 lowNibble = result[i * 2 + 1]; | ||
|
||
assertEq(highNibble, originalByte & 0xf0); | ||
assertEq(lowNibble, originalByte & 0x0f); | ||
} | ||
} | ||
|
||
function testSymbolicEqual(bytes memory a, bytes memory b) public pure { | ||
assertEq(Bytes.equal(a, b), Bytes.equal(a, b)); | ||
} | ||
|
||
function testSymbolicCountLeadingZeroes(uint256 x) public pure { | ||
uint256 result = Bytes.clz(x); | ||
assertLe(result, 32); // [0, 32] | ||
|
||
if (x != 0) { | ||
uint256 firstNonZeroBytePos = 32 - result - 1; | ||
uint256 byteValue = (x >> (firstNonZeroBytePos * 8)) & 0xff; | ||
assertNotEq(byteValue, 0); | ||
|
||
// x != 0 implies result < 32 | ||
// most significant byte should be non-zero | ||
uint256 msbValue = (x >> (248 - result * 8)) & 0xff; | ||
assertNotEq(msbValue, 0); | ||
} | ||
} | ||
} |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty inefficient, consider using
unchecked
at minimum.Also unclear why this is useful:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
unchecked
would be safe since overflow is impossible givenbytes memory
can realistically only have a length smaller than 256 bits.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd fix tests first and then optimize, it's likely that we may leverage other functions of the Bytes library
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial motivation for this function is to use it in #5680 for RLP, but I agree that reallocating memory is perhaps not the most efficient thing to do. I suspect RLP may not require the
nibbles()
function if these are read in place, but I need to experiment a bit with that.We can add the unchecked, but, is there an alternative you were thinking of? I am more worried about the memory expansion cost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if inspecting the nibbles JIT rather than converting to separate nibbles array is worthwhile to explore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In regard to memory expansion, we need calldata variants for all methods related to MPT proofs (both RLP decoding and MPT branch verification) since the input data is almost always provided by the user via calldata. There's rarely a reason to copy the full RLP payload into memory, because typical use cases (like verifying a historical blockhash) only require extracting one or two fields (e.g. stateRoot, txRoot). Operating directly on calldata avoids unnecessary memory allocation and is significantly more gas-efficient. Comparing my personal implementation to
RLPReader
I'm saving about 40k gas when parsing every block header field (which should never really be needed but serves as a good gas comparison and unit test).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the nibbles method it may make sense to first check the size of the input, if less than or equal to 32 bytes the above calculation could be much simpler (no loop needed)