You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
…e for inlined
# Which issue does this PR close?
- Closes [#7874](#7874)
# Rationale for this change
## Change Summary
Rework `inline_key_fast` to avoid reversing the inline data bytes by
removing the global `.to_be()` on the entire 128‑bit word and instead
manually constructing the big‑endian key in two parts: the 96‑bit data
portion and the 32‑bit length tiebreaker.
---
### Problem
In the original implementation:
```rust
let inline_u128 = u128::from_le_bytes(raw_bytes).to_be();
```
- **What went wrong**: Calling `.to_be()` on the full 16‑byte value
flips _all_ bytes, including the 12 bytes of inline data.
- **Consequences**: Multi‑byte strings are compared in reverse order —
e.g. `"backend one"` would sort as if it were `"eno dnekcab"` — so
lexicographical ordering is completely inverted.
- **Corner cases exposed**:
**“backend one” vs. “backend two”**: suffixes “one”/“two” compare
incorrectly once reversed.
---
### Solution
```rust
#[inline(always)]
pub fn inline_key_fast(raw: u128) -> u128 {
// 1. Decompose `raw` into little‑endian bytes:
// - raw_bytes[0..4] = length in LE
// - raw_bytes[4..16] = inline string data
let raw_bytes = raw.to_le_bytes();
// 2. Numerically truncate to get the low 32‑bit length (endianness‑free).
let length = raw as u32;
// 3. Build a 16‑byte buffer in big‑endian order:
// - buf[0..12] = inline string bytes (in original order)
// - buf[12..16] = length.to_be_bytes() (BE)
let mut buf = [0u8; 16];
buf[0..12].copy_from_slice(&raw_bytes[4..16]); // inline data
// Why convert length to big-endian for comparison?
//
// Rust (on most platforms) stores integers in little-endian format,
// meaning the least significant byte is at the lowest memory address.
// For example, an u32 value like 0x22345677 is stored in memory as:
//
// [0x77, 0x56, 0x34, 0x22] // little-endian layout
// ^ ^ ^ ^
// LSB ↑↑↑ MSB
//
// This layout is efficient for arithmetic but *not* suitable for
// lexicographic (dictionary-style) comparison of byte arrays.
//
// To compare values by byte order—e.g., for sorted keys or binary trees—
// we must convert them to **big-endian**, where:
//
// - The most significant byte (MSB) comes first (index 0)
// - The least significant byte (LSB) comes last (index N-1)
//
// In big-endian, the same u32 = 0x22345677 would be represented as:
//
// [0x22, 0x34, 0x56, 0x77]
//
// This ordering aligns with natural string/byte sorting, so calling
// `.to_be_bytes()` allows us to construct
// keys where standard numeric comparison (e.g., `<`, `>`) behaves
// like lexicographic byte comparison.
buf[12..16].copy_from_slice(&length.to_be_bytes()); // length in BE
// 4. Deserialize the buffer as a big‑endian u128:
// buf[0] is MSB, buf[15] is LSB.
// Details:
// Note on endianness and layout:
//
// Although `buf[0]` is stored at the lowest memory address,
// calling `u128::from_be_bytes(buf)` interprets it as the **most significant byte (MSB)**,
// and `buf[15]` as the **least significant byte (LSB)**.
//
// This is the core principle of **big-endian decoding**:
// - Byte at index 0 maps to bits 127..120 (highest)
// - Byte at index 1 maps to bits 119..112
// - ...
// - Byte at index 15 maps to bits 7..0 (lowest)
//
// So even though memory layout goes from low to high (left to right),
// big-endian treats the **first byte** as highest in value.
//
// This guarantees that comparing two `u128` keys is equivalent to lexicographically
// comparing the original inline bytes, followed by length.
u128::from_be_bytes(buf)
}
```
---
### Testing
All existing tests — including the “backend one” vs. “backend two” and
`"bar"` vs. `"bar\0"` cases — now pass, confirming both lexicographical
correctness and proper length‑based tiebreaking.
# What changes are included in this PR?
# Are these changes tested?
Yes
# Are there any user-facing changes?
No
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
0 commit comments