Skip to content

Commit 9a1b908

Browse files
gmlueckmaarquitos14Pennycook
authored
[SYCL][Doc] Extension spec for composite devices (intel#11846)
Add an extension specification for new APIs that allow an application to access card-level devices on PVC. --------- Co-authored-by: Marcos Maronas <maarquitos14@users.noreply.github.com> Co-authored-by: John Pennycook <john.pennycook@intel.com>
1 parent 803fa48 commit 9a1b908

File tree

1 file changed

+290
-0
lines changed

1 file changed

+290
-0
lines changed
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
= sycl_ext_oneapi_composite_device
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
14+
// Set the default source code type in this document to C++,
15+
// for syntax highlighting purposes. This is needed because
16+
// docbook uses c++ and html5 uses cpp.
17+
:language: {basebackend@docbook:c++:cpp}
18+
19+
20+
== Notice
21+
22+
[%hardbreaks]
23+
Copyright (C) 2023-2023 Intel Corporation. All rights reserved.
24+
25+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
26+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
27+
permission by Khronos.
28+
29+
30+
== Contact
31+
32+
To report problems with this extension, please open a new issue at:
33+
34+
https://github.com/intel/llvm/issues
35+
36+
37+
== Dependencies
38+
39+
This extension is written against the SYCL 2020 revision 8 specification. All
40+
references below to the "core SYCL specification" or to section numbers in the
41+
SYCL specification refer to that revision.
42+
43+
44+
== Status
45+
46+
This is a proposed extension specification, intended to gather community
47+
feedback. Interfaces defined in this specification may not be implemented yet
48+
or may be in a preliminary state. The specification itself may also change in
49+
incompatible ways before it is finalized. *Shipping software products should
50+
not rely on APIs defined in this specification.*
51+
52+
53+
== Backend support status
54+
55+
The APIs defined in this extension are only useful when using the Level Zero
56+
backend, and they are only useful when the Level Zero environment variable
57+
`ZE_FLAT_DEVICE_HIERARCHY=COMBINED` is set. The APIs may be called even when
58+
using other backends, but they will return an empty list of composite devices.
59+
60+
61+
== Overview
62+
63+
Some Intel GPU architectures are structured with multiple tiles on a single
64+
card. Currently, this applies only to the Intel(R) Data Center GPU Max Series
65+
(aka PVC). By default, SYCL exposes each of these tiles as a separate root
66+
device, so each tile corresponds to a separate `device` object that is returned
67+
from `device::get_devices`. Applications generally perform better when each
68+
tile is treated as a separate device, but some advanced use cases can benefit
69+
by treating all tiles on a card as a single "composite" device. This extension
70+
provides APIs that enable these advanced use cases.
71+
72+
A composite device has the same semantics as any other SYCL device, though the
73+
performance characteristics might be different. The application may submit a
74+
kernel to a composite device, and the implementation automatically schedules
75+
work-items to each of the underlying tiles. Memory allocated on the composite
76+
device is accessible to any of these work-items, regardless of which underlying
77+
tile it runs on.
78+
79+
Applications that use this extension can access the same hardware through
80+
different `device` objects. For example, it is possible to get one `device`
81+
object that represents a card and another `device` object that represents a
82+
tile from that card. As noted previously, a device representing a card is
83+
known as a "composite" device. A device representing a tile is known as a
84+
"component" of that composite device. Since the tile-level devices are
85+
returned from `device::get_devices`, they are also considered "root" devices,
86+
which is a term defined in the core SYCL specification.
87+
88+
89+
== Specification
90+
91+
=== Feature test macro
92+
93+
This extension provides a feature-test macro as described in the core SYCL
94+
specification. An implementation supporting this extension must predefine the
95+
macro `SYCL_EXT_ONEAPI_COMPOSITE_DEVICE` to one of the values defined in the
96+
table below. Applications can test for the existence of this macro to
97+
determine if the implementation supports this feature, or applications can test
98+
the macro's value to determine which of the extension's features the
99+
implementation supports.
100+
101+
[%header,cols="1,5"]
102+
|===
103+
|Value
104+
|Description
105+
106+
|1
107+
|The APIs of this experimental extension are not versioned, so the
108+
feature-test macro always has this value.
109+
|===
110+
111+
=== Enumerating composite devices
112+
113+
This extension adds two new functions for enumerating the available composite
114+
devices. One is a free function and the other is a new member function of the
115+
`platform` class:
116+
117+
```
118+
namespace sycl {
119+
namespace ext::oneapi::experimental {
120+
121+
std::vector<device> get_composite_devices();
122+
123+
} // namespace ext::oneapi::experimental
124+
125+
class platform {
126+
std::vector<device> ext_oneapi_get_composite_devices();
127+
};
128+
129+
} // namespace sycl
130+
```
131+
132+
The free function `get_composite_devices` returns all of the composite devices
133+
across all platforms. The member function
134+
`platform::ext_oneapi_get_composite_devices` returns the composite devices
135+
within the given platform.
136+
137+
The execution environment for a SYCL application has a fixed number of
138+
composite devices which does not vary as the application executes. As a
139+
result, each call to these functions returns the same set of `device` objects,
140+
and the order of those objects does not vary between calls.
141+
142+
=== New device information descriptors
143+
144+
This extension adds two new device information descriptors:
145+
146+
```
147+
namespace sycl::ext::oneapi::experimental::info::device {
148+
149+
struct component_devices;
150+
struct composite_device;
151+
152+
} // namespace sycl::ext::oneapi::experimental::info::device
153+
```
154+
155+
[width="100%",%header,cols="37%,19%,44%"]
156+
|===
157+
|Device descriptor
158+
|Return type
159+
|Description
160+
161+
|`component_devices`
162+
|`std::vector<device>`
163+
|Returns the set of component devices that are contained by a composite device.
164+
If this device is not a composite device, an empty vector is returned.
165+
166+
|`composite_device`
167+
|`device`
168+
|This query may only be applied to a device that has
169+
`aspect::ext_oneapi_is_component`. Returns the composite device which
170+
contains this component device. Since the set of composite devices is fixed,
171+
the `device` object returned from this query is a copy of one of the `device`
172+
objects returned by `get_composite_devices`.
173+
174+
If this device does not have `aspect::ext_oneapi_is_component`, the
175+
`device::get_info` function throws a synchronous `exception` with the
176+
`errc::invalid` error code.
177+
|===
178+
179+
=== New device aspects
180+
181+
This extension adds two new device aspects:
182+
183+
```
184+
namespace sycl {
185+
186+
enum class aspect : /*unspecified*/ {
187+
ext_oneapi_is_composite,
188+
ext_oneapi_is_component
189+
};
190+
191+
} // namespace sycl
192+
```
193+
194+
[width="100%",%header,cols="50%,50%"]
195+
|===
196+
|Aspect
197+
|Description
198+
199+
|`ext_oneapi_is_composite`
200+
|Indicates that the device is a composite device. Any device with this aspect
201+
will have at least two constituent component devices.
202+
203+
|`ext_oneapi_is_component`
204+
|Indicates that the device is a component device of some other composite
205+
device. This applies only to a root device that is a direct component of
206+
some composite device. A sub-device will not have this aspect even if its
207+
parent is a component device.
208+
|===
209+
210+
=== Impact on "descendent device"
211+
212+
This extension augments the definition of the term "descendent device" from the
213+
core SYCL specification as follows. Given some composite device _C_ and some
214+
component of that composite device _R_, device _R_ is a descendent device of
215+
_C_ and all descendent devices of _R_ are also descendent devices of _C_.
216+
217+
This definition means that a SYCL context that contains a composite device is
218+
compatible with any of its component devices, even if those component devices
219+
are not contained by the context. See the core SYCL specification for details.
220+
221+
222+
== Impact to the ONEAPI_DEVICE_SELECTOR
223+
224+
The `ONEAPI_DEVICE_SELECTOR` is an environment variable that is specific to the
225+
{dpcpp} implementation. Therefore, this section that describes the interaction
226+
between this extension and that environment variable is non-normative and does
227+
not apply to other SYCL implementations that may support this extension.
228+
229+
The `ONEAPI_DEVICE_SELECTOR` environment variable determines the list of root
230+
devices that are returned from `device::get_devices`: _R0_, _R1_, _R2_, etc.
231+
The devices returned from `get_composite_devices` are computed from this list
232+
by iterating over the elements _Ri_ in order:
233+
234+
* Start with an empty list of composite devices.
235+
* If device _Ri_ is a component of some composite device _C_ and if all other
236+
components of _C_ are also in the list of root devices returned from
237+
`device::get_devices`, then _C_ is appended to the list of composite devices
238+
unless _C_ is already in that list.
239+
240+
This algorithm ensures that a composite device is made visible to the
241+
application only if all of its components are also visible. This is important
242+
for two reasons. The first reason is philosophical. We do not want to expose
243+
a composite device unless it has at least two component devices. This is
244+
similar to our decision in the core SYCL spec to disallow partitioning into
245+
sub-devices which results in only a single sub-device. We don't want to expose
246+
partitioning when the parent and child represent exactly the same hardware.
247+
The second reason is practical. The {dpcpp} implementation will associate
248+
each composite device with a single Level Zero native device handle, which
249+
represents the entire card. There is no way to get a Level Zero handle that
250+
represents a subset of the tiles. Therefore, we do not expose a composite
251+
device unless it represents all of the tiles on a card.
252+
253+
The algorithm also imposes a sensible order on the composite devices, which
254+
corresponds to the order of the root devices. Thus, users who set the order of
255+
root devices via `ONEAPI_DEVICE_SELECTOR` will also get a corresponding order
256+
for the composite devices.
257+
258+
259+
== Issues
260+
261+
* What is the impact of this extension on
262+
link:../supported/sycl_ext_oneapi_default_context.asciidoc[
263+
sycl_ext_oneapi_default_context]? I think there are two reasonable options.
264+
One option is to say that the default context is not affected by this
265+
extension. As a result, applications that use composite devices will need to
266+
create their own context which contains the set of composite devices that
267+
they care about. The other option is to change the default context to
268+
include all of the composite devices. This would be easier to use because
269+
the default context continues to reflect all devices in the system. As a
270+
result, the default context can be used in all scenarios, and an application
271+
only needs to create a custom one in order to gain some optimization. It's
272+
not clear at this point if redefining the default context like this would
273+
result in a performance penalty for applications that don't use composite
274+
devices.
275+
276+
* What guarantees do we make (if any) about the accessibility of USM allocated
277+
on a composite device? Do we guarantee that this memory is also accessible
278+
on the component devices? Is the answer different for "device" USM vs.
279+
"shared" USM? One option is to say that the application needs to
280+
specifically enable P2P access between a composite device and its component
281+
devices in order to guarantee accessibility. There is a related question in
282+
the core SYCL spec about accessibility of USM w.r.t. sub-devices, which we
283+
have not yet resolved.
284+
285+
* Should the `ONEAPI_DEVICE_SELECTOR` be extended to allow selection of
286+
composite devices? For example, syntax like
287+
`ONEAPI_DEVICE_SELECTOR=level_zero:C1` could mean "select the second Level
288+
Zero card device and expose it as a root device from
289+
``device::get_devices``". Our current thinking is that we should not allow
290+
syntax like this.

0 commit comments

Comments
 (0)