Skip to content

Commit 89a66af

Browse files
committed
[CUDA] Fix MaxRegsPerBlock check in setKernelParams
Fix the computation of the block size passed to the MaxRegsPerBlock check. The size is a product of the dimensions, not a sum.
1 parent cfba9f1 commit 89a66af

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

source/adapters/cuda/enqueue.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -245,13 +245,14 @@ setKernelParams(const ur_context_handle_t Context,
245245
return UR_RESULT_SUCCESS;
246246
};
247247

248-
size_t KernelLocalWorkGroupSize = 0;
248+
size_t KernelLocalWorkGroupSize = 1;
249249
for (size_t Dim = 0; Dim < WorkDim; Dim++) {
250250
auto Err = IsValid(Dim);
251251
if (Err != UR_RESULT_SUCCESS)
252252
return Err;
253-
// If no error then sum the total local work size per dim.
254-
KernelLocalWorkGroupSize += LocalWorkSize[Dim];
253+
// If no error then compute the total local work size as a product of
254+
// all dims.
255+
KernelLocalWorkGroupSize *= LocalWorkSize[Dim];
255256
}
256257

257258
if (hasExceededMaxRegistersPerBlock(Device, Kernel,

0 commit comments

Comments
 (0)