Skip to content

Commit 7e7dffc

Browse files
EwanCXewar313
andauthored
[SYCL][UR][Graph] Require OpenCL simultaneous use (#17658)
To support the SYCL-Graph extension on an OpenCL backend, we currently only require the presence of the `cl_khr_command_buffer` extension. This PR introduces an extra requirement on the [CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR) capability being present. This is based on the [graph execution wording](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc#765-new-handler-member-functions) on the definition of `handler::ext_oneapi_graph()` that: > Only one instance of graph will execute at any time. If graph is submitted multiple times, dependencies are automatically added by the runtime to prevent concurrent executions of an identical graph. Such usage results in multiple calls by the SYCL runtime to `urEnqueueCommandBufferExp` with the same UR command-buffer and event dependencies to prevent concurrent execution. Without support for simultaneous-use the OpenCL adapter code cannot guarantee that the first command-buffer submission has finished execution before it makes following `clEnqueueCommandBufferKHR` calls with the `cl_event` decencies. If the first submission is still executing, then an error will be reported. Workarounds like adding blocking host waits to the OpenCL UR adapter are possible, but requiring simultaneous use reflects the vendor requirements as they are for the currently implementation. I've tried to document this all in the UR spec and SYCL-Graph design docs, which also includes a couple of cleanups I found along the way. Note that the new CTS test fails for Level-Zero adapter, which I've created #17734 to resolve. --------- Co-authored-by: Mikołaj Komar <mikolaj.komar@intel.com>
1 parent c410b9a commit 7e7dffc

File tree

6 files changed

+138
-78
lines changed

6 files changed

+138
-78
lines changed

sycl/doc/design/CommandGraph.md

Lines changed: 71 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -28,30 +28,26 @@ document for details of support of different SYCL backends.
2828
### UR Command-Buffer Experimental Feature
2929

3030
The command-buffer concept has been introduced to UR as an
31-
[experimental feature](https://oneapi-src.github.io/unified-runtime/core/api.html#command-buffer-experimental)
32-
with the following entry-points:
33-
34-
| Function | Description |
35-
| -------------------------------------------- | ----------- |
36-
| `urCommandBufferCreateExp` | Create a command-buffer. |
37-
| `urCommandBufferRetainExp` | Incrementing reference count of command-buffer. |
38-
| `urCommandBufferReleaseExp` | Decrementing reference count of command-buffer. |
39-
| `urCommandBufferFinalizeExp` | No more commands can be appended, makes command-buffer ready to enqueue on a command-queue. |
40-
| `urCommandBufferAppendKernelLaunchExp` | Append a kernel execution command to command-buffer. |
41-
| `urCommandBufferAppendUSMMemcpyExp` | Append a USM memcpy command to the command-buffer. |
42-
| `urCommandBufferAppendUSMFillExp` | Append a USM fill command to the command-buffer. |
43-
| `urCommandBufferAppendMemBufferCopyExp` | Append a mem buffer copy command to the command-buffer. |
44-
| `urCommandBufferAppendMemBufferWriteExp` | Append a memory write command to a command-buffer object. |
45-
| `urCommandBufferAppendMemBufferReadExp` | Append a memory read command to a command-buffer object. |
46-
| `urCommandBufferAppendMemBufferCopyRectExp` | Append a rectangular memory copy command to a command-buffer object. |
47-
| `urCommandBufferAppendMemBufferWriteRectExp` | Append a rectangular memory write command to a command-buffer object. |
48-
| `urCommandBufferAppendMemBufferReadRectExp` | Append a rectangular memory read command to a command-buffer object. |
49-
| `urCommandBufferAppendMemBufferFillExp` | Append a memory fill command to a command-buffer object. |
50-
| `urEnqueueCommandBufferExp` | Submit command-buffer to a command-queue for execution. |
51-
| `urCommandBufferUpdateKernelLaunchExp` | Updates the parameters of a previous kernel launch command. |
52-
31+
[experimental feature](https://oneapi-src.github.io/unified-runtime/core/api.html#command-buffer-experimental).
5332
See the [UR EXP-COMMAND-BUFFER](https://oneapi-src.github.io/unified-runtime/core/EXP-COMMAND-BUFFER.html)
54-
specification for more details.
33+
specification for details.
34+
35+
Device support for SYCL-Graph is communicated to the user via two aspects.
36+
The `aspect::ext_oneapi_limited_graph` aspect for basic graph support and
37+
the `aspect::ext_oneapi_graph` aspect for full graph support.
38+
39+
The `UR_DEVICE_INFO_COMMAND_BUFFER_SUPPORT_EXP` query result is used by the
40+
SYCL-RT to inform whether to report `aspect::ext_oneapi_limited_graph`.
41+
42+
Reporting of the `aspect::ext_oneapi_graph` aspect is based on the
43+
`UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_CAPABILITIES_EXP` query result. For
44+
a device to report this aspect, the UR query must report support for all of:
45+
46+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_KERNEL_ARGUMENTS`
47+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_LOCAL_WORK_SIZE`
48+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_GLOBAL_WORK_SIZE`
49+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_GLOBAL_WORK_OFFSET`
50+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_KERNEL_HANDLE`
5551

5652
## Design
5753

@@ -608,43 +604,14 @@ SYCL-Graph is only enabled for an OpenCL backend when the
608604
extension is available, however this information isn't available until runtime
609605
due to OpenCL implementations being loaded through an ICD.
610606

611-
The `ur_exp_command_buffer` string is conditionally returned from the OpenCL
612-
command-buffer UR backend at runtime based on `cl_khr_command_buffer` support
613-
to indicate that the graph extension should be enabled. This is information
614-
is propagated to the SYCL user via the
615-
`device.get_info<info::device::graph_support>()` query for graph extension
616-
support.
617-
618-
#### Limitations
619-
620-
Due to the API mapping gaps documented in the following section, OpenCL as a
621-
SYCL backend cannot fully support the graph API. Instead, there are
622-
limitations in the types of nodes which a user can add to a graph, using
623-
an unsupported node type will cause a SYCL exception to be thrown in graph
624-
finalization with error code `sycl::errc::feature_not_supported` and a message
625-
mentioning the unsupported command. For example,
626-
627-
```
628-
terminate called after throwing an instance of 'sycl::_V1::exception'
629-
what(): USM copy command not supported by graph backend
630-
```
631-
632-
The types of commands which are unsupported, and lead to this exception are:
633-
* `handler::copy(src, dest)` - Where `src` is an accessor and `dest` is a pointer.
634-
This corresponds to a memory buffer read command.
635-
* `handler::copy(src, dest)` - Where `src` is an pointer and `dest` is an accessor.
636-
This corresponds to a memory buffer write command.
637-
* `handler::copy(src, dest)` or `handler::memcpy(dest, src)` - Where both `src` and
638-
`dest` are USM pointers. This corresponds to a USM copy command.
639-
* `handler::fill(ptr, pattern, count)` - This corresponds to a USM memory
640-
fill command.
641-
* `handler::memset(ptr, value, numBytes)` - This corresponds to a USM memory
642-
fill command.
643-
* `handler::prefetch()`.
644-
* `handler::mem_advise()`.
645-
646-
Note that `handler::copy(src, dest)` where both `src` and `dest` are an accessor
647-
is supported, as a memory buffer copy command exists in the OpenCL extension.
607+
The `UR_DEVICE_INFO_COMMAND_BUFFER_SUPPORT_EXP` UR query returns true in the
608+
OpenCL UR adapter based on
609+
the presence of `cl_khr_command_buffer`, and the OpenCL device reporting
610+
support for
611+
[CL_COMMAND_BUFFER_CAPABILITY_SIMULTANEOUS_USE_KHR](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_COMMAND_BUFFER_CAPABILITY_SIMULTANEOUS_USE_KHR).
612+
The latter is required to enable multiple submissions of the same executable
613+
`command_graph` object without having to do a blocking wait on prior submissions
614+
in-between.
648615

649616
#### UR API Mapping
650617

@@ -678,18 +645,56 @@ adapter where there is matching support for each function in the list.
678645
| | clGetCommandBufferInfoKHR | No |
679646
| | clCommandSVMMemcpyKHR | No |
680647
| | clCommandSVMMemFillKHR | No |
681-
| urCommandBufferUpdateKernelLaunchExp | clUpdateMutableCommandsKHR | Yes[1] |
648+
| urCommandBufferUpdateKernelLaunchExp | clUpdateMutableCommandsKHR | Partial [See Update Section](#update-support) |
682649

683650
We are looking to address these gaps in the future so that SYCL-Graph can be
684651
fully supported on a `cl_khr_command_buffer` backend.
685652

686-
[1] Support for `urCommandBufferUpdateKernelLaunchExp` used to update the
653+
#### Unsupported Command Types
654+
655+
Due to the API mapping gaps documented in the previous section, OpenCL as a
656+
SYCL backend cannot fully support the graph API. Instead, there are
657+
limitations in the types of nodes which a user can add to a graph, using
658+
an unsupported node type will cause a SYCL exception to be thrown in graph
659+
finalization with error code `sycl::errc::feature_not_supported` and a message
660+
mentioning the unsupported command. For example,
661+
662+
```
663+
terminate called after throwing an instance of 'sycl::_V1::exception'
664+
what(): USM copy command not supported by graph backend
665+
```
666+
667+
The types of commands which are unsupported, and lead to this exception are:
668+
* `handler::copy(src, dest)` - Where `src` is an accessor and `dest` is a pointer.
669+
This corresponds to a memory buffer read command.
670+
* `handler::copy(src, dest)` - Where `src` is an pointer and `dest` is an accessor.
671+
This corresponds to a memory buffer write command.
672+
* `handler::copy(src, dest)` or `handler::memcpy(dest, src)` - Where both `src` and
673+
`dest` are USM pointers. This corresponds to a USM copy command.
674+
* `handler::fill(ptr, pattern, count)` - This corresponds to a USM memory
675+
fill command.
676+
* `handler::memset(ptr, value, numBytes)` - This corresponds to a USM memory
677+
fill command.
678+
* `handler::prefetch()`.
679+
* `handler::mem_advise()`.
680+
681+
Note that `handler::copy(src, dest)` where both `src` and `dest` are an accessor
682+
is supported, as a memory buffer copy command exists in the OpenCL extension.
683+
684+
#### Update Support
685+
686+
Support for `urCommandBufferUpdateKernelLaunchExp` used to update the
687687
configuration of kernel commands requires an OpenCL implementation with the
688688
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
689-
extension. The optional capabilities that are reported by this extension must
690-
include all of of `CL_MUTABLE_DISPATCH_GLOBAL_OFFSET_KHR`,
691-
`CL_MUTABLE_DISPATCH_GLOBAL_SIZE_KHR`, `CL_MUTABLE_DISPATCH_LOCAL_SIZE_KHR`,
692-
`CL_MUTABLE_DISPATCH_ARGUMENTS_KHR`, and `CL_MUTABLE_DISPATCH_EXEC_INFO_KHR`.
689+
extension.
690+
691+
However, the OpenCL adapter can not report `aspect::ext_oneapi_graph` for full
692+
SYCL-Graph support. As the `cl_khr_command_buffer_mutable_dispatch` extension
693+
has no concept of updating the `cl_kernel` objects in kernel commands, and so
694+
can't report the
695+
`UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_KERNEL_HANDLE` capability.
696+
This extension limitation is tracked in by the OpenCL Working Group in an
697+
[OpenCL-Docs Issue](https://github.com/KhronosGroup/OpenCL-Docs/issues/1279).
693698

694699
#### UR Command-Buffer Implementation
695700

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1347,6 +1347,11 @@ Parameters:
13471347
The other is <<enable-profiling, `property::graph::enable_profiling`>>
13481348
to enable profiling events returned from submissions of the executable graph.
13491349

1350+
Exceptions:
1351+
1352+
* Throws synchronously with error code `feature_not_supported` if the graph
1353+
contains a command that is not supported by the device.
1354+
13501355
Returns: A new executable graph object which can be submitted to a queue.
13511356

13521357
|

sycl/source/detail/graph_impl.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1015,9 +1015,9 @@ exec_graph_impl::enqueue(const std::shared_ptr<sycl::detail::queue_impl> &Queue,
10151015
// and potential hangs. We have therefore to expliclty wait in the host
10161016
// for previous submission to complete before resubmitting the
10171017
// command-buffer for level-zero backend.
1018-
// TODO : add a check to release this constraint and allow multiple
1019-
// concurrent submissions if the exec_graph has been updated since the
1020-
// last submission.
1018+
// TODO https://github.com/intel/llvm/issues/17734
1019+
// Remove this backend specific behavior and allow multiple concurrent
1020+
// submissions of the UR command-buffer.
10211021
for (std::vector<sycl::detail::EventImplPtr>::iterator It =
10221022
MExecutionEvents.begin();
10231023
It != MExecutionEvents.end();) {

unified-runtime/scripts/core/EXP-COMMAND-BUFFER.rst

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,11 +58,11 @@ to provide additional properties for how the command-buffer should be
5858
constructed. The members defined in ${x}_exp_command_buffer_desc_t are:
5959

6060
* ``isUpdatable``, which should be set to ``true`` to support :ref:`updating
61-
command-buffer commands`.
61+
command-buffer commands`.
6262
* ``isInOrder``, which should be set to ``true`` to enable commands enqueued to
63-
a command-buffer to be executed in an in-order fashion where possible.
63+
a command-buffer to be executed in an in-order fashion where possible.
6464
* ``enableProfiling``, which should be set to ``true`` to enable profiling of
65-
the command-buffer.
65+
the command-buffer.
6666

6767
Command-buffers are reference counted and can be retained and released by
6868
calling ${x}CommandBufferRetainExp and ${x}CommandBufferReleaseExp respectively.
@@ -226,15 +226,30 @@ Enqueueing Command-Buffers
226226
Command-buffers are submitted for execution on a ${x}_queue_handle_t with an
227227
optional list of dependent events. An event is returned which tracks the
228228
execution of the command-buffer, and will be complete when all appended commands
229-
have finished executing. It is adapter specific whether command-buffers can be
230-
enqueued or executed simultaneously, and submissions may be serialized.
229+
have finished executing.
231230

232231
.. parsed-literal::
233232
${x}_event_handle_t executionEvent;
234-
235233
${x}EnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,
236234
&executionEvent);
237235
236+
A command-buffer can be submitted for execution while a previous submission
237+
of the same command-buffer is still awaiting completion. That is, the user is not
238+
required to do a blocking wait on the completion of the first command-buffer
239+
submission before making a second submission of the command-buffer.
240+
241+
Submissions of the same command-buffer should be synchronized to prevent
242+
concurrent execution. For example, by using events, barriers, or in-order queue
243+
dependencies. The behavior of multiple submissions of the same command-buffer
244+
that can execute concurrently is undefined.
245+
246+
.. parsed-literal::
247+
// Valid usage if hQueue is in-order but undefined behavior is out-of-order
248+
${x}EnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,
249+
nullptr);
250+
${x}EnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,
251+
nullptr);
252+
238253
239254
Updating Command-Buffer Commands
240255
--------------------------------------------------------------------------------

unified-runtime/source/adapters/opencl/device.cpp

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1531,9 +1531,19 @@ UR_APIEXPORT ur_result_t UR_APICALL urDeviceGetInfo(ur_device_handle_t hDevice,
15311531
CL_RETURN_ON_FAILURE(clGetDeviceInfo(Dev, CL_DEVICE_EXTENSIONS, ExtSize,
15321532
ExtStr.data(), nullptr));
15331533

1534-
std::string SupportedExtensions(ExtStr.c_str());
1535-
return ReturnValue(ExtStr.find("cl_khr_command_buffer") !=
1536-
std::string::npos);
1534+
// cl_khr_command_buffer is required for UR command-buffer support
1535+
cl_device_command_buffer_capabilities_khr Caps = 0;
1536+
if (ExtStr.find("cl_khr_command_buffer") != std::string::npos) {
1537+
// A UR command-buffer user needs to be able to enqueue another
1538+
// submission of the same UR command-buffer without having to manually
1539+
// check if the first submission has completed.
1540+
CL_RETURN_ON_FAILURE(
1541+
clGetDeviceInfo(Dev, CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR,
1542+
sizeof(Caps), &Caps, nullptr));
1543+
}
1544+
1545+
return ReturnValue(
1546+
0 != (Caps & CL_COMMAND_BUFFER_CAPABILITY_SIMULTANEOUS_USE_KHR));
15371547
}
15381548
case UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_CAPABILITIES_EXP: {
15391549
cl_device_id Dev = cl_adapter::cast<cl_device_id>(hDevice);

unified-runtime/test/conformance/exp_command_buffer/fill.cpp

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,31 @@ TEST_P(urCommandBufferFillCommandsTest, Buffer) {
122122
verifyData(output, size);
123123
}
124124

125+
TEST_P(urCommandBufferFillCommandsTest, ExecuteTwice) {
126+
// TODO https://github.com/intel/llvm/issues/17734
127+
// Fail on Level-Zero due to blocking wait code in graph_impl.cpp specific
128+
// to the level-zero backend that needs moved into the Level-Zero v1 adapter.
129+
UUR_KNOWN_FAILURE_ON(uur::LevelZero{});
130+
ASSERT_SUCCESS(urCommandBufferAppendMemBufferFillExp(
131+
cmd_buf_handle, buffer, pattern.data(), pattern_size, 0, size, 0, nullptr,
132+
0, nullptr, &sync_point, nullptr, nullptr));
133+
134+
std::vector<uint8_t> output(size, 1);
135+
ASSERT_SUCCESS(urCommandBufferAppendMemBufferReadExp(
136+
cmd_buf_handle, buffer, 0, size, output.data(), 1, &sync_point, 0,
137+
nullptr, nullptr, nullptr, nullptr));
138+
139+
ASSERT_SUCCESS(urCommandBufferFinalizeExp(cmd_buf_handle));
140+
141+
ASSERT_SUCCESS(
142+
urEnqueueCommandBufferExp(queue, cmd_buf_handle, 0, nullptr, nullptr));
143+
ASSERT_SUCCESS(
144+
urEnqueueCommandBufferExp(queue, cmd_buf_handle, 0, nullptr, nullptr));
145+
ASSERT_SUCCESS(urQueueFinish(queue));
146+
147+
verifyData(output, size);
148+
}
149+
125150
TEST_P(urCommandBufferFillCommandsTest, USM) {
126151
ASSERT_SUCCESS(urCommandBufferAppendUSMFillExp(
127152
cmd_buf_handle, device_ptr, pattern.data(), pattern_size, size, 0,

0 commit comments

Comments
 (0)