Use bit ops instead of integer modulo and divide in shaders #19994

atlv24 · 2025-07-07T00:50:05Z

Objective

Our shader compilation goes through a lot of steps, where any number of things can and do sometimes go wrong: We write our bespoke naga-oil flavor of wgsl, which then gets processed into actual wgsl, which naga then turns into naga-ir, which naga then turns into hlsl/spirv/msl, which the driver then turns into ISA, which the gpu hardware then actually runs. Some layers are lossy and not very good for performance, namely naga->hlsl seems to output some unfortunate polyfills such as:

int naga_mod(int lhs, int rhs) {
    int divisor = ((lhs == int(-2147483647 - 1) & rhs == -1) | (rhs == 0)) ? 1 : rhs;
    return lhs - (lhs / divisor) * divisor;
}

in place of %, even for constant arguments. Some driver toolchains such as FXC then go on to complain about and seemingly not realize they can optimize this.

This potentially actually ends up mattering for CI times, but we'll see.

Solution

Make the lives of the tools easier by sacrificing some readability. The bit ops are what we intend to happen in these cases anyways, not a full blown modulo, so in a way this is just being explicit about it.

Alternate solution considered

Make a naga ir process step which replaces modulo by constant power of two by the equivalent bit op, and do similar for divs and muls. This is more complicated and less explicit though

Testing

deferred_rendering (on all 3 modes), transmission, morph_targets, ssao, volumetric_fog

alice-i-cecile

Not a rendering expert, but I would appreciate comments above the bitops that show what the equivalent non-bitops are.

JMS55 · 2025-07-07T03:07:58Z

crates/bevy_pbr/src/meshlet/meshlet_mesh_material.wgsl

    let material_id = vertex_input / 3u;
+    let vertex_index = vertex_input - material_id * 3u;


This might be worse. I thought compilers can see consecutive % and / and combine the instructions into one thing(?)

Not when its split into a function call that looks like

int naga_mod(int lhs, int rhs) { int divisor = ((lhs == int(-2147483647 - 1) & rhs == -1) | (rhs == 0)) ? 1 : rhs; return lhs - (lhs / divisor) * divisor; } // ... let vertex_index = naga_mod(vertex_input, 3u); let material_id = vertex_input / 3u;

superdump · 2025-07-07T06:33:38Z

crates/bevy_core_pipeline/src/experimental/mip_generation/downsample_depth.wgsl

-    let sub_xy = remap_for_wave_reduction(local_invocation_index % 64u);
-    let x = sub_xy.x + 8u * ((local_invocation_index >> 6u) % 2u);
+    let sub_xy = remap_for_wave_reduction(local_invocation_index & 63u);
+    let x = sub_xy.x + 8u * ((local_invocation_index >> 6u) & 1u);
    let y = sub_xy.y + 8u * (local_invocation_index >> 7u);


8u is << 3u. Or does naga deal with that properly?

superdump · 2025-07-07T06:34:01Z

crates/bevy_core_pipeline/src/experimental/mip_generation/downsample_depth.wgsl

-    let sub_xy = remap_for_wave_reduction(local_invocation_index % 64u);
-    let x = sub_xy.x + 8u * ((local_invocation_index >> 6u) % 2u);
+    let sub_xy = remap_for_wave_reduction(local_invocation_index & 63u);
+    let x = sub_xy.x + 8u * ((local_invocation_index >> 6u) & 1u);


8u is << 3u

superdump · 2025-07-07T06:41:07Z

crates/bevy_pbr/src/volumetric_fog/volumetric_fog.wgsl

@@ -103,6 +103,13 @@ fn henyey_greenstein(neg_LdotV: f32) -> f32 {
    return FRAC_4_PI * (1.0 - g * g) / (denom * sqrt(denom));
 }

+fn simple_wrap_3(index: i32) -> i32 {


Can index be >=6? If so then this is wrong, if not then I think this function should be named/commented to indicate it is special purpose.

its only used in this file, and its only called with numbers in range 0-5. I called it simple_wrap_3, not implying its modulo, because it is not an implementation of modulo, just something that works for this specific case

Could you add a comment saying that it only works for values 0 to 5, which is fine for its use in this file?

superdump

Mostly looks fine to me.

I noted early in my review that multiplication by powers of two is not changed to left shifts. Does naga handle this or is there no value to it?

Aside from that, just one comment about the simple_wrap_3 function to address.

atlv24 · 2025-07-07T06:56:05Z

Integer multiplication has a dedicated instruction on gpus, there's no need to replace it, bit shifts win us nothing in that case. Integer division and modulos are the real expensive ones, they are on the order of 100 cycles, whereas integer multiplies are usually a cycle or two.

Use bit ops instead of integer modulo and divide in shaders

e29d0d5

alice-i-cecile reviewed Jul 7, 2025

View reviewed changes

alice-i-cecile added C-Bug An unexpected or incorrect behavior A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Jul 7, 2025

github-project-automation bot added this to Rendering Jul 7, 2025

alice-i-cecile added S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help X-Contentious There are nontrivial implications that should be thought through S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jul 7, 2025

JMS55 reviewed Jul 7, 2025

View reviewed changes

Merge branch 'main' into ad/bit-op-shaders

72c9a37

github-actions bot mentioned this pull request Jul 7, 2025

19994 bevyengine/bevy-example-runner#160

Closed

superdump reviewed Jul 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use bit ops instead of integer modulo and divide in shaders #19994

Use bit ops instead of integer modulo and divide in shaders #19994

atlv24 commented Jul 7, 2025 •

edited

Loading

Uh oh!

alice-i-cecile left a comment

Uh oh!

JMS55 Jul 7, 2025

Uh oh!

atlv24 Jul 7, 2025 •

edited

Loading

Uh oh!

superdump Jul 7, 2025

Uh oh!

superdump Jul 7, 2025

Uh oh!

superdump Jul 7, 2025

Uh oh!

atlv24 Jul 7, 2025

Uh oh!

superdump Jul 7, 2025

Uh oh!

superdump left a comment

Uh oh!

atlv24 commented Jul 7, 2025

Uh oh!

Uh oh!

		let material_id = vertex_input / 3u;
		let vertex_index = vertex_input - material_id * 3u;

Uh oh!

Use bit ops instead of integer modulo and divide in shaders #19994

Are you sure you want to change the base?

Use bit ops instead of integer modulo and divide in shaders #19994

Conversation

atlv24 commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Solution

Alternate solution considered

Testing

Uh oh!

alice-i-cecile left a comment

Choose a reason for hiding this comment

Uh oh!

JMS55 Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

atlv24 Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

superdump Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

superdump Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

superdump Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

atlv24 Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

superdump Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

superdump left a comment

Choose a reason for hiding this comment

Uh oh!

atlv24 commented Jul 7, 2025

Uh oh!

Uh oh!

atlv24 commented Jul 7, 2025 •

edited

Loading

atlv24 Jul 7, 2025 •

edited

Loading