You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* The CScanner.h parameter computation calculates the number of virtual workgroups that will have to be launched for the Scan operation
42
+
* (always based on the elementCount) as well as different offsets for the results of each step of the Scan operation, flag positions
43
+
* that are used for synchronization etc.
44
+
* Remember that CScanner does a Blelloch Scan which works in levels. In each level of the Blelloch scan the array of elements is
45
+
* broken down into sets of size=WorkgroupSize and each set is scanned using Hillis & Steele (aka Stone-Kogge adder). The result of
46
+
* the scan is provided as an array element for the next level of the Blelloch Scan. This means that if we have 10000 elements and
47
+
* WorkgroupSize=250, we will break the array into 40 sets and take their reduction results. The next level of the Blelloch Scan will
48
+
* have an array of size 40. Only a single workgroup will be needed to work on that. After that array is scanned, we use the results
49
+
* in the downsweep phase of Blelloch Scan.
50
+
* Keep in mind that each virtual workgroup executes a single step of the whole algorithm, which is why we have the cumulativeWorkgroupCount.
51
+
* The first virtual workgroups will work on the upsweep phase, the next on the downsweep phase.
52
+
* The intermediate results are stored in a scratch buffer. That buffer's size is is the sum of the element-array size for all the
53
+
* Blelloch levels. Using the previous example, the scratch size should be 10000 + 40.
54
+
*
55
+
* Parameter meaning:
56
+
* |> lastElement - the index of the last element of each Blelloch level in the scratch buffer
57
+
* |> topLevel - the top level the Blelloch Scan will have (this depends on the elementCount and the WorkgroupSize)
58
+
* |> temporaryStorageOffset - an offset array for each level of the Blelloch Scan. It is used when storing the REDUCTION result of each workgroup scan
59
+
* |> cumulativeWorkgroupCount - the sum-scan of all the workgroups that will need to be launched for each level of the Blelloch Scan (both upsweep and downsweep)
60
+
* |> finishedFlagOffset - an index in the scratch buffer where each virtual workgroup indicates that ALL its invocations have finished their work. This helps
61
+
* synchronizing between workgroups with while-loop spinning.
62
+
*/
63
+
voidcomputeParameters(inuint elementCount, out Parameters_t _scanParams, out DefaultSchedulerParameters_t _schedulerParams)
GroupMemoryBarrierWithGroupSync(); // REVIEW: refactor this somewhere with GLSL terminology?
150
+
151
+
constuint globalWorkgroupIndex; // does every thread need to know?
152
+
sharedScratch.get(0u, globalWorkgroupIndex);
153
+
constuint lastLevel = topLevel<<1u;
154
+
if (gl_LocalInvocationIndex<=lastLevel && globalWorkgroupIndex>=params.cumulativeWorkgroupCount[gl_LocalInvocationIndex])
155
+
{
156
+
InterlockedAdd(sharedScratch.get(1u, ?), 1u); // REVIEW: The way scratchaccessoradaptor is implemented (e.g. under subgroup/arithmetic_portability) doesn't allow for atomic ops on the scratch buffer. Should we ask for another implementation that overrides the [] operator ?
157
+
}
158
+
GroupMemoryBarrierWithGroupSync(); // TODO (PentaKon): Possibly refactor?
const Parameters_t scanParams = getParameters(); // TODO (PentaKon): Undeclared as of now, this should return the Parameters_t from the push constants of (in)direct shader
while (scanScratch.data[dependentsFinishedFlagOffset]!=dependentsCount) // TODO (PentaKon): Refactor this when the ScanScratch descriptor set is declared
0 commit comments