Skip to content

feat(api): improve isValidSpanId, isValidTraceId performance #5714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 13, 2025

Conversation

seemk
Copy link
Contributor

@seemk seemk commented May 28, 2025

Short description of the changes

Use a lookup table to check for hex string validity instead of regexes.

Changed a test where a string was cast into SpanContext. Previously the test called isSpanContextValid with a string instead of SpanContext, so undefined spanId and traceId were passed to isValidTraceId and isValidSpanId. The regex implementation handled this implicitly, but this had never been exposed to the types. If this is an issue I can add a check for an undefined input (even though according to the types the input can only be string).

Tested in two different ways:

Test 1 - Total time to process 50 runs of 1M pregenerated IDs
Test 2 - Output of the benchmark library

Mixed IDs (equal amount of valid and invalid IDs)

Test 1
isValidTraceId (original) 4810.5 ms
isValidTraceId (new)      1808.5 ms (2.66x)

isValidSpanId (original) 2994.1 ms
isValidSpanId (new)       920.1 ms (3.25x)

Test 2
isValidTraceId (original) x 10,292,556 ops/sec ±0.41% (98 runs sampled)
isValidTraceId (new)      x 24,541,032 ops/sec ±0.51% (97 runs sampled) (2.38x)

isValidSpanId (original) x 15,623,278 ops/sec ±0.48% (96 runs sampled)
isValidSpanId (new)      x 43,478,105 ops/sec ±0.87% (94 runs sampled) (2.78x)

All valid IDs

Test 1
isValidTraceId (original) 5216.2 ms
isValidTraceId (new)      1467.4 ms (3.55x)

isValidSpanId (original) 3026.1 ms
isValidSpanId (new)       760.1 ms (3.98x)

Test 2
isValidTraceId (original) x 9,073,249 ops/sec ±0.36% (96 runs sampled)
isValidTraceId (new)      x 29,349,911 ops/sec ±0.51% (97 runs sampled) (3.23x)

isValidSpanId (original) x 14,880,091 ops/sec ±0.59% (95 runs sampled)
isValidSpanId (new)      x 51,324,066 ops/sec ±0.78% (96 runs sampled) (3.45x)

All random invalid IDs (randomly generated ID with an invalid character appearing at a random location)

Test 1
isValidTraceId (original) 4254.9 ms
isValidTraceId (new)      2083.2 ms (2.04x)

isValidSpanId (original) 2518.9 ms
isValidSpanId (new)      1058.5 ms (2.38x)

Test 2
isValidTraceId (original) x 11,643,782 ops/sec ±0.44% (99 runs sampled)
isValidTraceId (new)      x 21,810,864 ops/sec ±0.71% (98 runs sampled) (1.87x)

isValidSpanId (original) x 18,313,326 ops/sec ±0.45% (96 runs sampled)
isValidSpanId (new)      x 39,067,315 ops/sec ±0.87% (95 runs sampled) (2.13x)

INVALID_{TRACE,SPAN}_ID - all zeroes

Test 1
isValidTraceId (original): 1540.1 ms
isValidTraceId (new):      1410.1 ms (1.09x)

isValidSpanId (original): 958.3 ms
isValidSpanId (new):      723.8 ms (1.32x)

Test 2
isValidTraceId (original) x 28,751,309 ops/sec ±0.78% (94 runs sampled)
isValidTraceId (new)      x 29,106,073 ops/sec ±0.71% (93 runs sampled) (1.01x)

isValidSpanId (original) x 41,279,268 ops/sec ±0.84% (96 runs sampled)
isValidSpanId (new)      x 50,268,626 ops/sec ±1.04% (92 runs sampled) (1.22x)

@seemk seemk requested a review from a team as a code owner May 28, 2025 13:40
@seemk seemk marked this pull request as draft May 28, 2025 13:49
Copy link

codecov bot commented May 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.01%. Comparing base (62fbfd3) to head (d9e9365).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5714   +/-   ##
=======================================
  Coverage   95.00%   95.01%           
=======================================
  Files         303      303           
  Lines        7952     7957    +5     
  Branches     1609     1610    +1     
=======================================
+ Hits         7555     7560    +5     
  Misses        397      397           
Files with missing lines Coverage Δ
api/src/trace/spancontext-utils.ts 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@seemk seemk marked this pull request as ready for review May 28, 2025 14:27
Copy link
Member

@pichlermarc pichlermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I think some information about that look-up table could be helpful for someone that comes across this code wondering what it is (I happened to have seen this elsewhere but I don't know if it's common to see this sort of thing in JS).

If this is an issue I can add a check for an undefined input (even though according to the types the input can only be string).

My initial impulse would be to leave it out and see if it's a problem, but let's be safe and add a check since it's the API package. 😅

let r = 0;
for (let i = 0; i < id.length; i += 4) {
r +=
isHex[id.charCodeAt(i) & 0x7f] +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also a comment about masking so that this only lowest 7 bits are retained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explanation for the bitwise and

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible using charCodeAt to get partial code points from an extended-unicode code point containing string. Should we be using codePointAt? It would then also mean the number might be larger than 128 though (up to 2^16 I think)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment got me thinking that this implementation had an error, where an UTF-16 value like \u4141 (䅁) would be truncated to 0x41, or A, which is a valid character and an invalid ID containing such value would pass.

I changed the implementation to LUT[id.charCodeAt(i)] | 0 which coerces undefined to 0 and fixes the issue (while being faster than the original). See the chart - LUT unroll x 4 vs LUT unroll x 4 old.

As for codePointAt, I can't think of a case where a Unicode character taking up multiple UTF-16 values would have all of the UTF-16 characters in the range in the lookup table ('a-z', 'A-Z, '0-9'). Besides that, it seems String.length should not be used with codePointAt.

I've profiled different implementations now, here's the data:
image
image

The implementation in the PR is U8 LUT, unroll x 4.

You can try it out yourself: https://gist.github.com/seemk/f00c432d8779a03dbe1a28236f9dd198

@pichlermarc pichlermarc added api:traces Issues and PRs related to the Traces API pkg:api labels May 28, 2025
@seemk
Copy link
Contributor Author

seemk commented May 28, 2025

Nice!

I think some information about that look-up table could be helpful for someone that comes across this code wondering what it is (I happened to have seen this elsewhere but I don't know if it's common to see this sort of thing in JS).

If this is an issue I can add a check for an undefined input (even though according to the types the input can only be string).

My initial impulse would be to leave it out and see if it's a problem, but let's be safe and add a check since it's the API package. 😅

I reverted the test and added a check for undefined (plus the comments)

Copy link
Member

@dyladan dyladan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be just as fast and easier to read if it was a set?

const hexChars = new Set([
  "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
  "a", "b", "c", "d", "e", "f",
  "A", "B", "C", "D", "E", "F",
]);

function isValidHex(id: string, length: number) {
  // As of 1.9.0 the id was allowed to be undefined,
  // even though it was not possible in the types.
  if (id === undefined || id.length !== length) return false;
  for (const c of id) {
    if (!hexChars.has(c)) {
      return false;
    }
  }
  return true;
}

const VALID_TRACEID_REGEX = /^([0-9a-f]{32})$/i;
const VALID_SPANID_REGEX = /^[0-9a-f]{16}$/i;
// Valid characters (0-9, a-f, A-F) are marked as 1.
const isHex = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, Node core implements this with an Int8Array, which might be more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be about the same, with the Int8Array based version being a few percent slower (no idea why, could be noise):

isValidTraceId (LUT, unroll x4) 1474.1 ms
isValidTraceId (Int8Array LUT, unroll x4) 1562.7 ms
isValidSpanId (LUT, unroll x4) 748.7 ms
isValidSpanId (Int8Array LUT, unroll x4) 776.7 ms

isValidTraceId (LUT, unroll x4) x 29,297,057 ops/sec ±0.48% (96 runs sampled)
isValidTraceId (Int8Array LUT, unroll x4) x 31,043,607 ops/sec ±0.61% (95 runs sampled)
isValidSpanId (LUT, unroll x4) x 51,491,271 ops/sec ±0.79% (96 runs sampled)
isValidSpanId (Int8Array LUT, unroll x4) x 54,356,052 ops/sec ±1.02% (94 runs sampled)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I benchmarked this with jsbench.me and changing this to be a Uint8Array sped it up by about 12% for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be slightly faster on x64, see the results here: #5714 (comment) (there's a link to the benchmark script if you want to run it yourself).

I changed the implementation to use Uint8Array.

function isValidHex(id: string, length: number): boolean {
// As of 1.9.0 the id was allowed to be undefined,
// even though it was not possible in the types.
if (id === undefined || id.length !== length) return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking only for undefined, would it be more robust to check typeof id !== 'string'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed it to typeof id !== 'string'

@seemk
Copy link
Contributor Author

seemk commented May 29, 2025

Would it be just as fast and easier to read if it was a set?

const hexChars = new Set([
  "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
  "a", "b", "c", "d", "e", "f",
  "A", "B", "C", "D", "E", "F",
]);

function isValidHex(id: string, length: number) {
  // As of 1.9.0 the id was allowed to be undefined,
  // even though it was not possible in the types.
  if (id === undefined || id.length !== length) return false;
  for (const c of id) {
    if (!hexChars.has(c)) {
      return false;
    }
  }
  return true;
}

It's actually slower than the regex version:

isValidTraceId (original) 5289.4 ms
isValidTraceId (LUT, unroll x4) 1471.3 ms
isValidTraceId (set) 15970.2 ms

isValidSpanId (original) 2917.4 ms
isValidSpanId (LUT, unroll x4) 751.4 ms
isValidSpanId (set) 8467.6 ms
---
isValidTraceId (original) x 9,136,152 ops/sec ±0.46% (98 runs sampled)
isValidTraceId (LUT, unroll x4) x 29,705,793 ops/sec ±0.69% (97 runs sampled)
isValidTraceId (set) x 3,032,105 ops/sec ±0.10% (100 runs sampled)

isValidSpanId (original) x 15,580,277 ops/sec ±0.50% (95 runs sampled)
isValidSpanId (LUT, unroll x4) x 53,550,772 ops/sec ±0.76% (93 runs sampled)
isValidSpanId (set) x 5,930,540 ops/sec ±0.19% (98 runs sampled)

Copy link
Member

@dyladan dyladan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm marking this approved but I have a couple more options that I think would be at least interesting to see the benchmark results since you already have a benchmark setup. You don't need to take either of them, I just think this type of micro-optimization is fun and you nerd-sniped me a bit.

The first version uses simple arithmetic instead of a LUT which may be more readable.

function isHexArithmetic(str) {
  for (let i = 0; i < str.length; i++) {
    const charCode = str.charCodeAt(i);
    if (charCode < 48 || (charCode > 57 && charCode < 65) || (charCode > 70 && charCode < 97) || charCode > 102) {
      return false;
    }
  }
  return true;
}

The second version takes advantage of the contiguous memory of TypedArrays and completely throws readability out the window

const LUT = new Uint8Array([
  0b00000000, 0b00000000, 0b00000000, 0b00000000, 0b00000000, 0b00000000,
  0b11111111, 0b00000011, 0b01111110, 0b00000000, 0b00000000, 0b00000000,
  0b01111110, 0b00000000, 0b00000000, 0b00000000,
]);

function isHexPackedLUT(str) {
  for (let i = 0; i < str.length; i++) {
    const charCode = str.codePointAt(i);
    if (!(LUT[charCode >> 3] & (1 << (charCode & 7)))) {
      return false;
    }
  }
  return true;
}

edit: for those wondering what is going on in the packed LUT version

  • LUT is a simple bit-array. It's just a tightly packed version of the one used in the PR. Notice that each byte is reverse order though. That's important because while the left-most byte is the first byte, the right-most bit is the first bit of each byte.
  • >> 3 divides by 8 with no remainder to determine which byte we're looking at
  • & 7 gets the remainder of division by 8 to determine which bit we're looking at in that byte
  • 1 is left shifted that many times to allow us to do the bitwise &

@seemk
Copy link
Contributor Author

seemk commented May 30, 2025

I'm marking this approved but I have a couple more options that I think would be at least interesting to see the benchmark results since you already have a benchmark setup. You don't need to take either of them, I just think this type of micro-optimization is fun and you nerd-sniped me a bit.

The first version uses simple arithmetic instead of a LUT which may be more readable.

function isHexArithmetic(str) {
  for (let i = 0; i < str.length; i++) {
    const charCode = str.charCodeAt(i);
    if (charCode < 48 || (charCode > 57 && charCode < 65) || (charCode > 70 && charCode < 97) || charCode > 102) {
      return false;
    }
  }
  return true;
}

The second version takes advantage of the contiguous memory of TypedArrays and completely throws readability out the window

const LUT = new Uint8Array([
  0b00000000, 0b00000000, 0b00000000, 0b00000000, 0b00000000, 0b00000000,
  0b11111111, 0b00000011, 0b01111110, 0b00000000, 0b00000000, 0b00000000,
  0b01111110, 0b00000000, 0b00000000, 0b00000000,
]);

function isHexPackedLUT(str) {
  for (let i = 0; i < str.length; i++) {
    const charCode = str.codePointAt(i);
    if (!(LUT[charCode >> 3] & (1 << (charCode & 7)))) {
      return false;
    }
  }
  return true;
}

edit: for those wondering what is going on in the packed LUT version

  • LUT is a simple bit-array. It's just a tightly packed version of the one used in the PR. Notice that each byte is reverse order though. That's important because while the left-most byte is the first byte, the right-most bit is the first bit of each byte.
  • >> 3 divides by 8 with no remainder to determine which byte we're looking at
  • & 7 gets the remainder of division by 8 to determine which bit we're looking at in that byte
  • 1 is left shifted that many times to allow us to do the bitwise &

I tested an if else based one initially, but left it out, here are the results:

With valid hex strings (which sould be the usual case for OTel):

isValidTraceId (original) x 9,023,702 ops/sec ±0.41% (96 runs sampled)
isValidTraceId (LUT, unroll x4) x 29,370,600 ops/sec ±0.41% (97 runs sampled)
isValidTraceId (isHexArithmetic) x 8,792,365 ops/sec ±0.55% (99 runs sampled)
isValidTraceId (isHexPackedLUT) x 24,088,125 ops/sec ±0.54% (95 runs sampled)
isValidSpanId (original) x 15,017,101 ops/sec ±0.57% (94 runs sampled)
isValidSpanId (LUT, unroll x4) x 51,371,239 ops/sec ±0.63% (96 runs sampled)
isValidSpanId (isHexArithmetic) x 16,227,776 ops/sec ±0.58% (98 runs sampled)
isValidSpanId (isHexPackedLUT) x 42,877,635 ops/sec ±0.78% (98 runs sampled)

With malformed hex strings (invalid hex character at random position):

isValidTraceId (original) x 11,129,327 ops/sec ±0.43% (95 runs sampled)
isValidTraceId (LUT, unroll x4) x 29,276,959 ops/sec ±0.45% (96 runs sampled)
isValidTraceId (isHexArithmetic) x 16,068,503 ops/sec ±0.51% (97 runs sampled)
isValidTraceId (isHexPackedLUT) x 33,366,037 ops/sec ±0.95% (96 runs sampled)
isValidSpanId (original) x 17,646,360 ops/sec ±0.31% (97 runs sampled)
isValidSpanId (LUT, unroll x4) x 37,972,807 ops/sec ±0.64% (93 runs sampled)
isValidSpanId (isHexArithmetic) x 24,483,385 ops/sec ±0.53% (96 runs sampled)
isValidSpanId (isHexPackedLUT) x 41,557,560 ops/sec ±0.72% (95 runs sampled)

The packed LUT seems to do surprisingly well and is about 10% faster in the negative cases (likely due to the early exit) and only about 15-20% slower in the most common path.

@pichlermarc pichlermarc added this pull request to the merge queue Jun 13, 2025
Merged via the queue into open-telemetry:main with commit f2cfd13 Jun 13, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:traces Issues and PRs related to the Traces API pkg:api
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants