-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Auto-spread large WebGPU compute dispatches #8696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev-2.0
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3813,10 +3813,45 @@ ${hookUniformFields}} | |
| const WORKGROUP_SIZE_Y = 8; | ||
| const WORKGROUP_SIZE_Z = 1; | ||
|
|
||
| // Calculate number of workgroups needed | ||
| const workgroupCountX = Math.ceil(x / WORKGROUP_SIZE_X); | ||
| const workgroupCountY = Math.ceil(y / WORKGROUP_SIZE_Y); | ||
| const workgroupCountZ = Math.ceil(z / WORKGROUP_SIZE_Z); | ||
| // auto spreading: if any dimension is too large or for performance optimization, | ||
| // spread total iteration count across dimensions | ||
| const totalIterations = x * y * z; | ||
| const MAX_THREADS_PER_DIM = 65535 * 8; | ||
|
|
||
| let px = x; | ||
| let py = y; | ||
| let pz = z; | ||
|
|
||
| // we spread if we exceed GPU limits OR if it involves a large 1D dispatch | ||
| const exceedsLimits = x > MAX_THREADS_PER_DIM || y > MAX_THREADS_PER_DIM || z > MAX_THREADS_PER_DIM; | ||
| const isLarge1D = totalIterations > 1024 && y === 1 && z === 1; | ||
|
|
||
| if (exceedsLimits || isLarge1D) { | ||
| if (totalIterations > 1000000) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of curiosity is there any benefit to spreading across dimensions like this for lower iteration counts too? e.g. if you're doing a big for loop inside of each iteration, with a smaller number of iterations, is there any difference?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question! Currently I only auto-spread when count > 1024 to avoid overhead for small dispatches. For lower counts with heavy per iteration work, manual spreading might still help but I kept it simple for now. We could test this if you think it's worth optimizing?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's worth testing at least to know what kind of difference it makes, and similarly if it's better to spread across 3 dimensions earlier too. A sort of table of performance tests would help us just be a bit more confident about our optimizations.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can run some quick tests comparing different spreading approaches across small, medium, and large counts. I’ll check 1D, 2D (square/rectangular), and 3D, and share a simple performance table with the results. Should be interesting to see where things start to slow down. let me know if there’s anything specific you’d like me to test, or if you want to try something on your machine as well 👍 |
||
| // 3D cube type for extreme large counts | ||
| px = Math.ceil(Math.pow(totalIterations, 1 / 3)); | ||
| py = Math.ceil(Math.pow(totalIterations, 1 / 3)); | ||
| pz = Math.ceil(totalIterations / (px * py)); | ||
| } else { | ||
| // 2D square type for moderate large counts | ||
| px = Math.ceil(Math.sqrt(totalIterations)); | ||
| py = Math.ceil(totalIterations / px); | ||
| pz = 1; | ||
| } | ||
|
|
||
| if (p5.debug || exceedsLimits) { | ||
| console.warn( | ||
| `p5.js: Compute dispatch (${x}, ${y}, ${z}) auto-spread to (${px}, ${py}, ${pz}) ` + | ||
| `to ${exceedsLimits ? 'stay within GPU limits' : 'optimize performance'}.` | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| shader.setUniform('uPhysicalCount', [px, py, pz]); | ||
|
|
||
| const workgroupCountX = Math.ceil(px / WORKGROUP_SIZE_X); | ||
| const workgroupCountY = Math.ceil(py / WORKGROUP_SIZE_Y); | ||
| const workgroupCountZ = Math.ceil(pz / WORKGROUP_SIZE_Z); | ||
|
|
||
| const commandEncoder = this.device.createCommandEncoder(); | ||
| const passEncoder = commandEncoder.beginComputePass(); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind elaborating on what these changes are there to handle? Anything we should have more test cases for in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fix void return types in compute shaders. Without them, doing
return;in a compute hook would crash with "Missing dataType". Most compute shaders use void (side-effects only), so the auto-spread wouldn't work without this fix.For tests - should I add cases for void hooks with early returns? The main compute functionality already has test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, got it. Right, let's add a test for early returns, since this wasn't a case covered by any tests before. Thanks!
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the test cases for void compute hooks with early returns. Both tests are passing.
Thanks!