-
-
Notifications
You must be signed in to change notification settings - Fork 35.9k
Introduction of parameterizable workgroups for compute shaders and dispatch sizes for the renderer #31402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
📦 Bundle sizeFull ESM build, minified and gzipped.
🌳 Bundle size after tree-shakingMinimal build including a renderer, camera, empty scene, and dependencies.
|
…ecification of the dispatchSize
…ecification of the dispatchSize
…ecification of the dispatchSize
73116e1
to
910ce62
Compare
src/nodes/gpgpu/ComputeNode.js
Outdated
let count = countOrWorkgroupSize; | ||
|
||
if ( Array.isArray( countOrWorkgroupSize ) ) { | ||
|
||
workgroupSize = countOrWorkgroupSize; | ||
count = null; | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the project agrees with the API change I'm wondering if we should deprecate the old code path where "count" is passed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've set it up so that count is used if no workgroup is specified, so that existing user code and examples continue to run as before. But yes, I would also be in favor of removing the count from the code in the long term, with the usual transition time. This also significantly simplifies the compute call in the backend.
const isValid = dispatchSize[ 0 ] > 0 && dispatchSize[ 1 ] > 0 && dispatchSize[ 2 ] > 0; | ||
|
||
dispatchSize = isValid ? dispatchSize : computeNodeData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's valid to set @workgroup_size to 1 or 2 values when defining the parameters inline with the remaining dimensions implicitly being set to "1" (see the spec here) so it would be nice to allow for providing workgroup size of 1 or 2 to align with the native behavior and for the sake of ergonomics:
const computeShader = wgslFn( `fn fragmentShader() -> void {}` );
// all are equivalant
const kernel = computeShader().compute( [ 16 ] );
const kernel = computeShader().compute( [ 16, 1 ] );
const kernel = computeShader().compute( [ 16, 1, 1 ] );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that from the W3C documentation, I'll take a look at it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be implemented in two ways. Since it makes no difference in performance, I prefer to check the length of the workgroupSize array before calling the wgsl code builder to fill it with ones if smaller than a three-valued array
this.computeShader = this._getWGSLComputeCode( shadersData.compute, ( this.object.workgroupSize || [ 64 ] ).join( ', ' ) ); | ||
const workgroupSize = this.object.workgroupSize || [ 64, 1, 1 ]; | ||
|
||
if ( workgroupSize.length !== 3 ) throw new Error( 'workgroupSize must have 3 elements' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise dispatchWorkgroups can also take 1 or 2 parameters with the remainder defaulting to 1:
// all equivelant
renderer.computeAsync( kernel, [ 10 ] );
renderer.computeAsync( kernel, [ 10, 1 ] );
renderer.computeAsync( kernel, [ 10, 1, 1 ] );
I'm curious to hear other opinions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's simple. I can do it tonight ( CET ). If you have any other requests by then, I can include them as well
@@ -2351,6 +2352,38 @@ class Renderer { | |||
|
|||
} | |||
|
|||
if ( dispatchSize !== null ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this just to compute()
node, so we avoid duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean move the entire validation to the compute() and just pass the dispatchSize through in the Renderer like before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's better to just use the workgroupSize from the compute() node itself. I would also suggest reverting to the default value [ 64 ], and removing support for non-array numbers. Then we can add these details in the js-doc about auto-fill by 1.
Related issue: #31393
So far, the workgroup for the compute shader consists of a scalar and the dispatchSize always uses dispatch.x up to
device.limits.maxComputeWorkgroupsPerDimension
before dispatch.y is used. The default workgroup is now [ 64, 1, 1 ] instead of [ 64 ]. This ensures that all previous examples are treated as before.Example on webgpu_compute_particles_rain, usual usage:
Use with workgroup and dispatchSize
Since the instanceCount in the example is 25,000, workgroup = [ 16, 16, 1 ] and dispatchSize = [ 10, 10, 1 ] = 16 * 10 * 16 * 10 * 1 = 25,600 threads, enough to cover everything. If you choose smaller values for dispatchSize, e.g., [ 9, 9, 1 ], you would see that not all raindrops are computed.
You now have full control over the thread distribution for the GPU, which was not possible.
I'm setting the PR to draft for now, even though it works, because I might think of a few more things I'd like to add. Maybe warnings if you enter something incorrectly.