From b5fe4ca9e88ad1b8b01b54f7ced9f991399fb5bc Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Mon, 22 May 2023 09:53:02 -0500 Subject: [PATCH 1/5] Create CommandGraph.md --- sycl/doc/design/CommandGraph.md | 61 +++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 sycl/doc/design/CommandGraph.md diff --git a/sycl/doc/design/CommandGraph.md b/sycl/doc/design/CommandGraph.md new file mode 100644 index 0000000000000..9088cd8a76458 --- /dev/null +++ b/sycl/doc/design/CommandGraph.md @@ -0,0 +1,61 @@ +# Command Graph Extension + +This document describes the implementation design of the +[SYCL Graph Extension](https://github.com/intel/llvm/pull/5626). + +A related presentation can be found +[here](https://www.youtube.com/watch?v=aOTAmyr04rM). + +## Requirements + +An efficient implementation of a lazy command graph execution and its replay +requires extensions to the PI layer. Such an extension is command buffers. +We distinguish between backends that support command buffer extensions and +those that do not. Currently command buffer extensions are only supported by +Level Zero. All other backends would fall back to an emulation mode. + +The emulation mode targets support of functionality only, without potentially +resulting performance improvements, i.e. execution of a closed Level Zero +command list multiple times. + +#### Command Buffer extension + +| Function | Description | +| ------------------------- | ------------------------ | +| piextCommandBufferCreate: | create a command-buffer. | +| piextCommandBufferRetain: | incrementing reference count of command-buffer. | +| piextCommandBufferRelease: decrementing reference count of command-buffer. | +| piextCommandBufferFinalize: no more commands can be appended, makes command + buffer ready to enqueue on command-queue. | +| piextCommandBufferNDRangeKernel: append a kernel execution command to command +buffer. | +| piextEnqueueCommandBuffer: | submit command-buffer to queue for execution | + +## Design + +![Basic architecture diagram.](images/SYCL-Graph-Architecture.svg) + +There are two sets of user facing interfaces that are proposed to create a +command graph object: +Explicit and Record & Replay API. Within the runtime they share a common +infrastructure. + +## Scheduler integration + +When there are no requirements for accessors in a command graph the scheduler +is bypassed and it is directly enqueued to a command buffer. If +there are requirements, commands need to be enqueued by the scheduler. + +## Memory handling: Buffer and Accessor + +There is no extra support for Graph-specific USM allocations in the current +proposal. Memory operations will be supported subsequently by the current +implementation starting with `memcpy`. + +Buffers and accessors are supported in a command graph. Following restrictions +are required to adapt buffers and their lifetime to a lazy work execution model: + +- Lifetime of a buffer with host data will be extended by copying the underlying +data. +- Host accessor on buffer that are used by a command graph are prohibited. +- Copy-back behavior on destruction of a data is prohibited. From 99894fe66296eb740c062d25772711b54ca51a78 Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Mon, 22 May 2023 09:53:54 -0500 Subject: [PATCH 2/5] Add files via upload --- sycl/doc/design/images/SYCL-Graph-Architecture.svg | 1 + 1 file changed, 1 insertion(+) create mode 100644 sycl/doc/design/images/SYCL-Graph-Architecture.svg diff --git a/sycl/doc/design/images/SYCL-Graph-Architecture.svg b/sycl/doc/design/images/SYCL-Graph-Architecture.svg new file mode 100644 index 0000000000000..da0c789c8d560 --- /dev/null +++ b/sycl/doc/design/images/SYCL-Graph-Architecture.svg @@ -0,0 +1 @@ +SYCL Graph ArchitectureCPU, GPU, FPGA, hetero/hybrid/converged architectures …Level ZeroSYCL Graph Extensions APISYCL RuntimeApplicationLegendA BA uses / depends on BSYCL RuntimeImplemented BackendsApplication layerUnified Runtime + Command Buffer ExtensionsCUDA (example)NVIDIA GPUFuture Backend support \ No newline at end of file From a04cdb3325a2283ba6f88ca9ecaa09cfdcf902d6 Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Tue, 23 May 2023 15:13:14 -0500 Subject: [PATCH 3/5] Apply suggestions from code review --- sycl/doc/design/CommandGraph.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/sycl/doc/design/CommandGraph.md b/sycl/doc/design/CommandGraph.md index 9088cd8a76458..d3f41c56e5b88 100644 --- a/sycl/doc/design/CommandGraph.md +++ b/sycl/doc/design/CommandGraph.md @@ -9,7 +9,10 @@ A related presentation can be found ## Requirements An efficient implementation of a lazy command graph execution and its replay -requires extensions to the PI layer. Such an extension is command buffers. +requires extensions to the PI layer. Such an extension is command buffers, +where a command-buffer object represents a series of operations to be enqueued +to the backend device and their dependencies. A single command graph can be +partitioned into more than one PI command-buffer by the runtime. We distinguish between backends that support command buffer extensions and those that do not. Currently command buffer extensions are only supported by Level Zero. All other backends would fall back to an emulation mode. @@ -22,14 +25,14 @@ command list multiple times. | Function | Description | | ------------------------- | ------------------------ | -| piextCommandBufferCreate: | create a command-buffer. | -| piextCommandBufferRetain: | incrementing reference count of command-buffer. | -| piextCommandBufferRelease: decrementing reference count of command-buffer. | -| piextCommandBufferFinalize: no more commands can be appended, makes command +| `piextCommandBufferCreate` | create a command-buffer. | +| `piextCommandBufferRetain` | incrementing reference count of command-buffer. | +| `piextCommandBufferRelease` | decrementing reference count of command-buffer. | +| `piextCommandBufferFinalize` | no more commands can be appended, makes command buffer ready to enqueue on command-queue. | -| piextCommandBufferNDRangeKernel: append a kernel execution command to command +| `piextCommandBufferNDRangeKernel` | append a kernel execution command to command buffer. | -| piextEnqueueCommandBuffer: | submit command-buffer to queue for execution | +| `piextEnqueueCommandBuffer` | submit command-buffer to queue for execution | ## Design From 6d956561c5bbdf0f60c587f902685e6d9eb4569c Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Tue, 23 May 2023 15:17:44 -0500 Subject: [PATCH 4/5] Update CommandGraph.md --- sycl/doc/design/CommandGraph.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/sycl/doc/design/CommandGraph.md b/sycl/doc/design/CommandGraph.md index d3f41c56e5b88..870ca7c9c74a2 100644 --- a/sycl/doc/design/CommandGraph.md +++ b/sycl/doc/design/CommandGraph.md @@ -28,10 +28,8 @@ command list multiple times. | `piextCommandBufferCreate` | create a command-buffer. | | `piextCommandBufferRetain` | incrementing reference count of command-buffer. | | `piextCommandBufferRelease` | decrementing reference count of command-buffer. | -| `piextCommandBufferFinalize` | no more commands can be appended, makes command - buffer ready to enqueue on command-queue. | -| `piextCommandBufferNDRangeKernel` | append a kernel execution command to command -buffer. | +| `piextCommandBufferFinalize` | no more commands can be appended, makes command buffer ready to enqueue on command-queue. | +| `piextCommandBufferNDRangeKernel` | append a kernel execution command to command buffer. | | `piextEnqueueCommandBuffer` | submit command-buffer to queue for execution | ## Design From 4112ace8770498c57f425c6d62247999e6e8146e Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Wed, 31 May 2023 10:01:08 -0500 Subject: [PATCH 5/5] Apply suggestions from code review Co-authored-by: Ewan Crawford Co-authored-by: Ben Tracy --- sycl/doc/design/CommandGraph.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/sycl/doc/design/CommandGraph.md b/sycl/doc/design/CommandGraph.md index 870ca7c9c74a2..a2005e6e53589 100644 --- a/sycl/doc/design/CommandGraph.md +++ b/sycl/doc/design/CommandGraph.md @@ -9,19 +9,20 @@ A related presentation can be found ## Requirements An efficient implementation of a lazy command graph execution and its replay -requires extensions to the PI layer. Such an extension is command buffers, +requires extensions to the UR layer. Such an extension is command buffers, where a command-buffer object represents a series of operations to be enqueued to the backend device and their dependencies. A single command graph can be -partitioned into more than one PI command-buffer by the runtime. +partitioned into more than one command-buffer by the runtime. We distinguish between backends that support command buffer extensions and those that do not. Currently command buffer extensions are only supported by -Level Zero. All other backends would fall back to an emulation mode. +Level Zero. All other backends would fall back to an emulation mode, or not +be reported as supported. The emulation mode targets support of functionality only, without potentially resulting performance improvements, i.e. execution of a closed Level Zero command list multiple times. -#### Command Buffer extension +### Command Buffer extension | Function | Description | | ------------------------- | ------------------------ | @@ -31,12 +32,15 @@ command list multiple times. | `piextCommandBufferFinalize` | no more commands can be appended, makes command buffer ready to enqueue on command-queue. | | `piextCommandBufferNDRangeKernel` | append a kernel execution command to command buffer. | | `piextEnqueueCommandBuffer` | submit command-buffer to queue for execution | +| `piextCommandBufferMemcpyUSM` | append a USM memcpy command to the command-buffer. | +| `piextCommandBufferMemBufferCopy` | append a mem buffer copy command to the command-buffer. | +| `piextCommandBufferMemBufferCopyRect` | append a rectangular mem buffer copy command to the command-buffer. | ## Design ![Basic architecture diagram.](images/SYCL-Graph-Architecture.svg) -There are two sets of user facing interfaces that are proposed to create a +There are two sets of user facing interfaces that can be used to create a command graph object: Explicit and Record & Replay API. Within the runtime they share a common infrastructure. @@ -59,4 +63,4 @@ are required to adapt buffers and their lifetime to a lazy work execution model: - Lifetime of a buffer with host data will be extended by copying the underlying data. - Host accessor on buffer that are used by a command graph are prohibited. -- Copy-back behavior on destruction of a data is prohibited. +- Copy-back behavior on destruction of a buffer is prohibited.