Improve VPCLMULQDQ to use 512-bit wide registers #8

onethumb · 2025-06-07T04:11:29Z

The Problem

The current implementation uses 4 x 256-bit registers, but modern VPCLMULQDQ CPUs support 512-bit operations, which should be faster.

The Solution

Implement 4 x 512-bit operations for increased throughput. CRC-64/NVME calculations went from ~56.4 GiB/s to ~96.1 GiB/s on a Sapphire Rapids AWS EC c7i.8xlarge instance.

Changes

Calculate a new 256-byte distance folding coefficient for all CRC variants
Update VPCLMULQDQ calculations to use 512-bit wide registers and intrinsics

Planned version bump

Which: MINOR
Why: non-breaking new functionality (gated behind nightly and the vpclmulqdq feature flag)

When using 512-bit registers, we need to use coefficient pairs for folding 256 byte distances, as opposed to the 128 byte folding differences for smaller registers.

Only for x86_64 CPUs supporting VPCLMULQD. Gated behind builds using +nightly with the “vpclmulqdq” feature flag. Provides nearly a 2X boost in throughput. CRC-64/NVME is now ~96GiB/s on Intel Sapphire Rapids (AWS EC2 c7i.metal-48xl), up from ~56GiB/s.

Represents the new performance impact from the wider AVX512 registers.

onethumb added 4 commits June 6, 2025 20:07

Calculate 256-byte coefficients

2e0fbff

When using 512-bit registers, we need to use coefficient pairs for folding 256 byte distances, as opposed to the 128 byte folding differences for smaller registers.

Update benchmarks in README

41b7600

Represents the new performance impact from the wider AVX512 registers.

Merge branch 'main' into implement-vpclmulqdq-512-bits

6eaf13e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve VPCLMULQDQ to use 512-bit wide registers #8

Improve VPCLMULQDQ to use 512-bit wide registers #8

Uh oh!

onethumb commented Jun 7, 2025

Uh oh!

Uh oh!

Improve VPCLMULQDQ to use 512-bit wide registers #8

Are you sure you want to change the base?

Improve VPCLMULQDQ to use 512-bit wide registers #8

Uh oh!

Conversation

onethumb commented Jun 7, 2025

The Problem

The Solution

Changes

Planned version bump

Uh oh!

Uh oh!