Skip to content

Commit 7fab237

Browse files
Pennycookkbenzie
authored andcommitted
Do not set global offset unless required (#18242)
Previously, zeKernelSetGlobalOffsetExp was called for every kernel launch. The vast majority of kernels are expected to never have an offset, because the offset feature was deprecated in SYCL 2020, and we should optimize for this case. The SYCL RT currently passes {0, 0, 0} in the case where there is no offset. To optimize this case: - A zero offset is treated equivalently to a NULL offset, and zeKernelSetGlobalOffsetExp is not called. - A non-zero offset triggers a call to zeKernelSetGlobalOffsetExp before launching the kernel. - A non-zero offset triggers a call to zeKernelSetGlobalOffsetExp after launching the kernel, to reset the offset to zero. This will introduce additional overhead to the uncommon case where offsets are specified, but we plan to remove this anyway. In the long-term, the check for a {0, 0, 0} offset should probably be moved into the SYCL headers and NULL should be passed directly to UR. However, this will require wide-reaching changes to other UR adapters and the UR specification. --------- Signed-off-by: John Pennycook <john.pennycook@intel.com>
1 parent 4c9944a commit 7fab237

File tree

3 files changed

+39
-0
lines changed

3 files changed

+39
-0
lines changed

source/adapters/level_zero/helpers/kernel_helpers.hpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,18 @@ ur_result_t getSuggestedLocalWorkSize(ur_device_handle_t hDevice,
5656
ze_kernel_handle_t hZeKernel,
5757
size_t GlobalWorkSize3D[3],
5858
uint32_t SuggestedLocalWorkSize3D[3]);
59+
60+
/**
61+
* Handle uncommon conditions after kernel submission.
62+
* Resets the offset to {0, 0, 0} if one was supplied.
63+
* @param[in] hZeKernel The kernel handle.
64+
* @param[in] pGlobalWorkOffset Pointer to offset array.
65+
*/
66+
inline void postSubmit(ze_kernel_handle_t hZeKernel,
67+
const size_t *pGlobalWorkOffset) {
68+
// If this kernel was launched with an offset, clear it for the next launch.
69+
// This slows down kernels with offsets but keeps the common case fast.
70+
if (pGlobalWorkOffset != NULL) {
71+
zeKernelSetGlobalOffsetExp(hZeKernel, 0, 0, 0);
72+
}
73+
}

source/adapters/level_zero/v2/command_list_manager.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,16 @@ ur_result_t ur_command_list_manager::appendKernelLaunch(
219219
waitListView.clear();
220220
};
221221

222+
// If the offset is {0, 0, 0}, pass NULL instead.
223+
// This allows us to skip setting the offset.
224+
bool hasOffset = false;
225+
for (uint32_t i = 0; i < workDim; ++i) {
226+
hasOffset |= pGlobalWorkOffset[i];
227+
}
228+
if (!hasOffset) {
229+
pGlobalWorkOffset = NULL;
230+
}
231+
222232
UR_CALL(hKernel->prepareForSubmission(context, device, pGlobalWorkOffset,
223233
workDim, WG[0], WG[1], WG[2],
224234
memoryMigrate));
@@ -229,6 +239,8 @@ ur_result_t ur_command_list_manager::appendKernelLaunch(
229239
(zeCommandList.get(), hZeKernel, &zeThreadGroupDimensions,
230240
zeSignalEvent, waitListView.num, waitListView.handles));
231241

242+
postSubmit(hZeKernel, pGlobalWorkOffset);
243+
232244
return UR_RESULT_SUCCESS;
233245
}
234246

source/adapters/level_zero/v2/queue_immediate_in_order.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -809,6 +809,16 @@ ur_result_t ur_queue_immediate_in_order_t::enqueueCooperativeKernelLaunchExp(
809809
waitListView.clear();
810810
};
811811

812+
// If the offset is {0, 0, 0}, pass NULL instead.
813+
// This allows us to skip setting the offset.
814+
bool hasOffset = false;
815+
for (uint32_t i = 0; i < workDim; ++i) {
816+
hasOffset |= pGlobalWorkOffset[i];
817+
}
818+
if (!hasOffset) {
819+
pGlobalWorkOffset = NULL;
820+
}
821+
812822
UR_CALL(hKernel->prepareForSubmission(hContext, hDevice, pGlobalWorkOffset,
813823
workDim, WG[0], WG[1], WG[2],
814824
memoryMigrate));
@@ -822,6 +832,8 @@ ur_result_t ur_queue_immediate_in_order_t::enqueueCooperativeKernelLaunchExp(
822832

823833
recordSubmittedKernel(hKernel);
824834

835+
postSubmit(hZeKernel, pGlobalWorkOffset);
836+
825837
return UR_RESULT_SUCCESS;
826838
}
827839

0 commit comments

Comments
 (0)