Skip to content

Commit 2a97b05

Browse files
authored
[SYCL][Graph] Add testing for sycl_ext_oneapi_local_memory (#16379)
- Adds testing that verifies the interaction between sycl_ext_oneapi_local_memory and sycl_ext_oneapi_graph. - Reorder the extensions in the sycl graph spec to be listed in alphabetical order. - Explicitly state in the sycl graph spec that using sycl_ext_oneapi_local_memory is supported.
1 parent 839f0af commit 2a97b05

File tree

4 files changed

+168
-69
lines changed

4 files changed

+168
-69
lines changed

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

Lines changed: 75 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -2075,6 +2075,45 @@ recording mode, as opposed to throwing.
20752075
This section defines the interaction of `sycl_ext_oneapi_graph` with other
20762076
extensions.
20772077

2078+
==== sycl_ext_codeplay_enqueue_native_command
2079+
2080+
`ext_codeplay_enqueue_native_command`, defined in
2081+
link:../experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc[sycl_ext_codeplay_enqueue_native_command]
2082+
cannot be used in graph nodes. A synchronous exception will be thrown with error
2083+
code `invalid` if a user tries to add them to a graph.
2084+
2085+
Removing this restriction is something we may look at for future revisions of
2086+
`sycl_ext_oneapi_graph`.
2087+
2088+
==== sycl_ext_intel_queue_index
2089+
2090+
The compute index queue property defined by
2091+
link:../supported/sycl_ext_intel_queue_index.asciidoc[sycl_ext_intel_queue_index]
2092+
is ignored during queue recording.
2093+
2094+
Using this information is something we may look at for future revisions of
2095+
`sycl_ext_oneapi_graph`.
2096+
2097+
==== sycl_ext_oneapi_bindless_images
2098+
2099+
The new handler methods, and queue shortcuts, defined by
2100+
link:../experimental/sycl_ext_oneapi_bindless_images.asciidoc[sycl_ext_oneapi_bindless_images]
2101+
cannot be used in graph nodes. A synchronous exception will be thrown with error
2102+
code `invalid` if a user tries to add them to a graph.
2103+
2104+
Removing this restriction is something we may look at for future revisions of
2105+
`sycl_ext_oneapi_graph`.
2106+
2107+
==== sycl_ext_oneapi_device_global
2108+
2109+
The new handler methods, and queue shortcuts, defined by
2110+
link:../experimental/sycl_ext_oneapi_device_global.asciidoc[sycl_ext_oneapi_device_global].
2111+
cannot be used in graph nodes. A synchronous exception will be thrown with error
2112+
code `invalid` if a user tries to add them to a graph.
2113+
2114+
Removing this restriction is something we may look at for future revisions of
2115+
`sycl_ext_oneapi_graph`.
2116+
20782117
==== sycl_ext_oneapi_discard_queue_events
20792118

20802119
When recording a `sycl::queue` which has been created with the
@@ -2108,37 +2147,25 @@ nodes that are recorded from multiple queues and/or added by the explicit API:
21082147
* The only commands which have an implicit dependency on the barrier command
21092148
are those recorded from the same queue the barrier command was submitted to.
21102149

2111-
==== sycl_ext_oneapi_memcpy2d
2112-
2113-
The new handler methods, and queue shortcuts, defined by
2114-
link:../supported/sycl_ext_oneapi_memcpy2d.asciidoc[sycl_ext_oneapi_memcpy2d]
2115-
cannot be used in graph nodes. A synchronous exception will be thrown with
2116-
error code `invalid` if a user tries to add them to a graph.
2117-
2118-
Removing this restriction is something we may look at for future revisions of
2119-
`sycl_ext_oneapi_graph`.
2120-
2121-
==== sycl_ext_oneapi_queue_priority
2122-
2123-
The queue priority property defined by
2124-
link:../supported/sycl_ext_oneapi_queue_priority.asciidoc[sycl_ext_oneapi_queue_priority]
2125-
is ignored during queue recording.
2150+
==== sycl_ext_oneapi_enqueue_functions
21262151

2127-
==== sycl_ext_oneapi_queue_empty
2152+
The command submission functions defined in
2153+
link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[sycl_ext_oneapi_enqueue_functions]
2154+
can be used adding nodes to a graph when creating a graph from queue recording.
2155+
New methods are also defined that enable submitting an executable graph,
2156+
e.g. directly to a queue without returning an event.
21282157

2129-
The `queue::ext_oneapi_empty()` query defined by the
2130-
link:../supported/sycl_ext_oneapi_queue_empty.asciidoc[sycl_ext_oneapi_queue_empty]
2131-
extension behaves as normal during queue recording and is not captured to the graph.
2132-
Recorded commands are not counted as submitted for the purposes of this query.
2158+
==== sycl_ext_oneapi_free_function_kernels
21332159

2134-
==== sycl_ext_intel_queue_index
2160+
`sycl_ext_oneapi_free_function_kernels`, defined in
2161+
link:../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc[sycl_ext_oneapi_free_function_kernels]
2162+
can be used with SYCL Graphs.
21352163

2136-
The compute index queue property defined by
2137-
link:../supported/sycl_ext_intel_queue_index.asciidoc[sycl_ext_intel_queue_index]
2138-
is ignored during queue recording.
2164+
==== sycl_ext_oneapi_kernel_compiler_spirv
21392165

2140-
Using this information is something we may look at for future revisions of
2141-
`sycl_ext_oneapi_graph`.
2166+
The kernels loaded using
2167+
link:../experimental/sycl_ext_oneapi_kernel_compiler_spirv.asciidoc[sycl_ext_oneapi_kernel_compiler_spirv]
2168+
behave as normal when used in graph nodes.
21422169

21432170
==== sycl_ext_oneapi_kernel_properties
21442171

@@ -2147,62 +2174,41 @@ link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[sycl_ext_oneapi_
21472174
can be used in graph nodes in the same way as they are used in normal queue
21482175
submission.
21492176

2150-
==== sycl_ext_oneapi_prod
2177+
==== sycl_ext_oneapi_local_memory
21512178

2152-
The new `sycl::queue::ext_oneapi_prod()` method added by
2153-
link:../proposed/sycl_ext_oneapi_prod.asciidoc[sycl_ext_oneapi_prod]
2154-
behaves as normal during queue recording and is not captured to the graph.
2155-
Recorded commands are not counted as submitted for the purposes of its operation.
2179+
Allocating local memory inside a graph kernel node with `sycl::ext::oneapi::group_local_memory()` or
2180+
`sycl::ext::oneapi::group_local_memory_for_overwrite()` is supported. These methods are defined by
2181+
link:../supported/sycl_ext_oneapi_local_memory.asciidoc[sycl_ext_oneapi_local_memory.]
21562182

2157-
==== sycl_ext_oneapi_device_global
2158-
2159-
The new handler methods, and queue shortcuts, defined by
2160-
link:../experimental/sycl_ext_oneapi_device_global.asciidoc[sycl_ext_oneapi_device_global].
2161-
cannot be used in graph nodes. A synchronous exception will be thrown with error
2162-
code `invalid` if a user tries to add them to a graph.
2163-
2164-
Removing this restriction is something we may look at for future revisions of
2165-
`sycl_ext_oneapi_graph`.
2166-
2167-
==== sycl_ext_oneapi_bindless_images
2183+
==== sycl_ext_oneapi_memcpy2d
21682184

21692185
The new handler methods, and queue shortcuts, defined by
2170-
link:../experimental/sycl_ext_oneapi_bindless_images.asciidoc[sycl_ext_oneapi_bindless_images]
2171-
cannot be used in graph nodes. A synchronous exception will be thrown with error
2172-
code `invalid` if a user tries to add them to a graph.
2186+
link:../supported/sycl_ext_oneapi_memcpy2d.asciidoc[sycl_ext_oneapi_memcpy2d]
2187+
cannot be used in graph nodes. A synchronous exception will be thrown with
2188+
error code `invalid` if a user tries to add them to a graph.
21732189

21742190
Removing this restriction is something we may look at for future revisions of
21752191
`sycl_ext_oneapi_graph`.
21762192

2177-
==== sycl_ext_oneapi_kernel_compiler_spirv
2178-
2179-
The kernels loaded using
2180-
link:../experimental/sycl_ext_oneapi_kernel_compiler_spirv.asciidoc[sycl_ext_oneapi_kernel_compiler_spirv]
2181-
behave as normal when used in graph nodes.
2182-
2183-
==== sycl_ext_codeplay_enqueue_native_command
2184-
2185-
`ext_codeplay_enqueue_native_command`, defined in
2186-
link:../experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc[sycl_ext_codeplay_enqueue_native_command]
2187-
cannot be used in graph nodes. A synchronous exception will be thrown with error
2188-
code `invalid` if a user tries to add them to a graph.
2193+
==== sycl_ext_oneapi_prod
21892194

2190-
Removing this restriction is something we may look at for future revisions of
2191-
`sycl_ext_oneapi_graph`.
2195+
The new `sycl::queue::ext_oneapi_prod()` method added by
2196+
link:../proposed/sycl_ext_oneapi_prod.asciidoc[sycl_ext_oneapi_prod]
2197+
behaves as normal during queue recording and is not captured to the graph.
2198+
Recorded commands are not counted as submitted for the purposes of its operation.
21922199

2193-
==== sycl_ext_oneapi_enqueue_functions
2200+
==== sycl_ext_oneapi_queue_empty
21942201

2195-
The command submission functions defined in
2196-
link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[sycl_ext_oneapi_enqueue_functions]
2197-
can be used adding nodes to a graph when creating a graph from queue recording.
2198-
New methods are also defined that enable submitting an executable graph,
2199-
e.g. directly to a queue without returning an event.
2202+
The `queue::ext_oneapi_empty()` query defined by the
2203+
link:../supported/sycl_ext_oneapi_queue_empty.asciidoc[sycl_ext_oneapi_queue_empty]
2204+
extension behaves as normal during queue recording and is not captured to the graph.
2205+
Recorded commands are not counted as submitted for the purposes of this query.
22002206

2201-
==== sycl_ext_oneapi_free_function_kernels
2207+
==== sycl_ext_oneapi_queue_priority
22022208

2203-
`sycl_ext_oneapi_free_function_kernels`, defined in
2204-
link:../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc[sycl_ext_oneapi_free_function_kernels]
2205-
can be used with SYCL Graphs.
2209+
The queue priority property defined by
2210+
link:../supported/sycl_ext_oneapi_queue_priority.asciidoc[sycl_ext_oneapi_queue_priority]
2211+
is ignored during queue recording.
22062212

22072213
==== sycl_ext_oneapi_work_group_memory
22082214

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// RUN: %{build} -o %t.out
2+
// RUN: %{run} %t.out
3+
// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
4+
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}
5+
// Extra run to check for immediate-command-list in Level Zero
6+
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}
7+
8+
#define GRAPH_E2E_EXPLICIT
9+
10+
#include "../Inputs/compile_time_local_memory.cpp"
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
// Tests adding kernel nodes with local memory that is allocated using
2+
// the sycl_ext_oneapi_local_memory extension.
3+
4+
#include "../graph_common.hpp"
5+
#include <sycl/ext/oneapi/group_local_memory.hpp>
6+
7+
int main() {
8+
queue Queue{};
9+
10+
using T = int;
11+
constexpr size_t LocalSize = 128;
12+
13+
std::vector<T> HostData(Size);
14+
std::iota(HostData.begin(), HostData.end(), 10);
15+
16+
exp_ext::command_graph Graph{Queue.get_context(), Queue.get_device()};
17+
18+
T *PtrA = malloc_device<T>(Size, Queue);
19+
20+
Queue.copy(HostData.data(), PtrA, Size);
21+
Queue.wait_and_throw();
22+
23+
auto NodeA = add_node(Graph, Queue, [&](handler &CGH) {
24+
CGH.parallel_for(nd_range({Size}, {LocalSize}), [=](nd_item<1> Item) {
25+
multi_ptr<size_t[LocalSize], access::address_space::local_space>
26+
LocalMem = sycl::ext::oneapi::group_local_memory<size_t[LocalSize]>(
27+
Item.get_group());
28+
*LocalMem[Item.get_local_linear_id()] = Item.get_global_linear_id() * 2;
29+
PtrA[Item.get_global_linear_id()] +=
30+
*LocalMem[Item.get_local_linear_id()];
31+
});
32+
});
33+
34+
add_node(
35+
Graph, Queue,
36+
[&](handler &CGH) {
37+
depends_on_helper(CGH, NodeA);
38+
CGH.parallel_for(nd_range({Size}, {LocalSize}), [=](nd_item<1> Item) {
39+
multi_ptr<size_t[LocalSize], access::address_space::local_space>
40+
LocalMem = sycl::ext::oneapi::group_local_memory_for_overwrite<
41+
size_t[LocalSize]>(Item.get_group());
42+
*LocalMem[Item.get_local_linear_id()] =
43+
Item.get_global_linear_id() + 4;
44+
PtrA[Item.get_global_linear_id()] *=
45+
*LocalMem[Item.get_local_linear_id()];
46+
});
47+
},
48+
NodeA);
49+
50+
auto GraphExec = Graph.finalize();
51+
52+
for (unsigned n = 0; n < Iterations; n++) {
53+
Queue.submit([&](handler &CGH) { CGH.ext_oneapi_graph(GraphExec); });
54+
}
55+
56+
Queue.wait_and_throw();
57+
58+
Queue.copy(PtrA, HostData.data(), Size);
59+
Queue.wait_and_throw();
60+
61+
free(PtrA, Queue);
62+
63+
for (size_t i = 0; i < Size; i++) {
64+
T Ref = 10 + i;
65+
for (size_t iter = 0; iter < Iterations; ++iter) {
66+
Ref += (i * 2);
67+
Ref *= (i + 4);
68+
}
69+
assert(check_value(i, Ref, HostData[i], "PtrA"));
70+
}
71+
72+
return 0;
73+
}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// RUN: %{build} -o %t.out
2+
// RUN: %{run} %t.out
3+
// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
4+
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}
5+
// Extra run to check for immediate-command-list in Level Zero
6+
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}
7+
8+
#define GRAPH_E2E_RECORD_REPLAY
9+
10+
#include "../Inputs/compile_time_local_memory.cpp"

0 commit comments

Comments
 (0)