|
| 1 | += sycl_ext_oneapi_composite_device |
| 2 | + |
| 3 | +:source-highlighter: coderay |
| 4 | +:coderay-linenums-mode: table |
| 5 | + |
| 6 | +// This section needs to be after the document title. |
| 7 | +:doctype: book |
| 8 | +:toc2: |
| 9 | +:toc: left |
| 10 | +:encoding: utf-8 |
| 11 | +:lang: en |
| 12 | +:dpcpp: pass:[DPC++] |
| 13 | + |
| 14 | +// Set the default source code type in this document to C++, |
| 15 | +// for syntax highlighting purposes. This is needed because |
| 16 | +// docbook uses c++ and html5 uses cpp. |
| 17 | +:language: {basebackend@docbook:c++:cpp} |
| 18 | + |
| 19 | + |
| 20 | +== Notice |
| 21 | + |
| 22 | +[%hardbreaks] |
| 23 | +Copyright (C) 2023-2023 Intel Corporation. All rights reserved. |
| 24 | + |
| 25 | +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks |
| 26 | +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by |
| 27 | +permission by Khronos. |
| 28 | + |
| 29 | + |
| 30 | +== Contact |
| 31 | + |
| 32 | +To report problems with this extension, please open a new issue at: |
| 33 | + |
| 34 | +https://github.com/intel/llvm/issues |
| 35 | + |
| 36 | + |
| 37 | +== Dependencies |
| 38 | + |
| 39 | +This extension is written against the SYCL 2020 revision 8 specification. All |
| 40 | +references below to the "core SYCL specification" or to section numbers in the |
| 41 | +SYCL specification refer to that revision. |
| 42 | + |
| 43 | + |
| 44 | +== Status |
| 45 | + |
| 46 | +This is a proposed extension specification, intended to gather community |
| 47 | +feedback. Interfaces defined in this specification may not be implemented yet |
| 48 | +or may be in a preliminary state. The specification itself may also change in |
| 49 | +incompatible ways before it is finalized. *Shipping software products should |
| 50 | +not rely on APIs defined in this specification.* |
| 51 | + |
| 52 | + |
| 53 | +== Backend support status |
| 54 | + |
| 55 | +The APIs defined in this extension are only useful when using the Level Zero |
| 56 | +backend, and they are only useful when the Level Zero environment variable |
| 57 | +`ZE_FLAT_DEVICE_HIERARCHY=COMBINED` is set. The APIs may be called even when |
| 58 | +using other backends, but they will return an empty list of composite devices. |
| 59 | + |
| 60 | + |
| 61 | +== Overview |
| 62 | + |
| 63 | +Some Intel GPU architectures are structured with multiple tiles on a single |
| 64 | +card. Currently, this applies only to the Intel(R) Data Center GPU Max Series |
| 65 | +(aka PVC). By default, SYCL exposes each of these tiles as a separate root |
| 66 | +device, so each tile corresponds to a separate `device` object that is returned |
| 67 | +from `device::get_devices`. Applications generally perform better when each |
| 68 | +tile is treated as a separate device, but some advanced use cases can benefit |
| 69 | +by treating all tiles on a card as a single "composite" device. This extension |
| 70 | +provides APIs that enable these advanced use cases. |
| 71 | + |
| 72 | +A composite device has the same semantics as any other SYCL device, though the |
| 73 | +performance characteristics might be different. The application may submit a |
| 74 | +kernel to a composite device, and the implementation automatically schedules |
| 75 | +work-items to each of the underlying tiles. Memory allocated on the composite |
| 76 | +device is accessible to any of these work-items, regardless of which underlying |
| 77 | +tile it runs on. |
| 78 | + |
| 79 | +Applications that use this extension can access the same hardware through |
| 80 | +different `device` objects. For example, it is possible to get one `device` |
| 81 | +object that represents a card and another `device` object that represents a |
| 82 | +tile from that card. As noted previously, a device representing a card is |
| 83 | +known as a "composite" device. A device representing a tile is known as a |
| 84 | +"component" of that composite device. Since the tile-level devices are |
| 85 | +returned from `device::get_devices`, they are also considered "root" devices, |
| 86 | +which is a term defined in the core SYCL specification. |
| 87 | + |
| 88 | + |
| 89 | +== Specification |
| 90 | + |
| 91 | +=== Feature test macro |
| 92 | + |
| 93 | +This extension provides a feature-test macro as described in the core SYCL |
| 94 | +specification. An implementation supporting this extension must predefine the |
| 95 | +macro `SYCL_EXT_ONEAPI_COMPOSITE_DEVICE` to one of the values defined in the |
| 96 | +table below. Applications can test for the existence of this macro to |
| 97 | +determine if the implementation supports this feature, or applications can test |
| 98 | +the macro's value to determine which of the extension's features the |
| 99 | +implementation supports. |
| 100 | + |
| 101 | +[%header,cols="1,5"] |
| 102 | +|=== |
| 103 | +|Value |
| 104 | +|Description |
| 105 | + |
| 106 | +|1 |
| 107 | +|The APIs of this experimental extension are not versioned, so the |
| 108 | + feature-test macro always has this value. |
| 109 | +|=== |
| 110 | + |
| 111 | +=== Enumerating composite devices |
| 112 | + |
| 113 | +This extension adds two new functions for enumerating the available composite |
| 114 | +devices. One is a free function and the other is a new member function of the |
| 115 | +`platform` class: |
| 116 | + |
| 117 | +``` |
| 118 | +namespace sycl { |
| 119 | +namespace ext::oneapi::experimental { |
| 120 | + |
| 121 | +std::vector<device> get_composite_devices(); |
| 122 | + |
| 123 | +} // namespace ext::oneapi::experimental |
| 124 | + |
| 125 | +class platform { |
| 126 | + std::vector<device> ext_oneapi_get_composite_devices(); |
| 127 | +}; |
| 128 | + |
| 129 | +} // namespace sycl |
| 130 | +``` |
| 131 | + |
| 132 | +The free function `get_composite_devices` returns all of the composite devices |
| 133 | +across all platforms. The member function |
| 134 | +`platform::ext_oneapi_get_composite_devices` returns the composite devices |
| 135 | +within the given platform. |
| 136 | + |
| 137 | +The execution environment for a SYCL application has a fixed number of |
| 138 | +composite devices which does not vary as the application executes. As a |
| 139 | +result, each call to these functions returns the same set of `device` objects, |
| 140 | +and the order of those objects does not vary between calls. |
| 141 | + |
| 142 | +=== New device information descriptors |
| 143 | + |
| 144 | +This extension adds two new device information descriptors: |
| 145 | + |
| 146 | +``` |
| 147 | +namespace sycl::ext::oneapi::experimental::info::device { |
| 148 | + |
| 149 | +struct component_devices; |
| 150 | +struct composite_device; |
| 151 | + |
| 152 | +} // namespace sycl::ext::oneapi::experimental::info::device |
| 153 | +``` |
| 154 | + |
| 155 | +[width="100%",%header,cols="37%,19%,44%"] |
| 156 | +|=== |
| 157 | +|Device descriptor |
| 158 | +|Return type |
| 159 | +|Description |
| 160 | + |
| 161 | +|`component_devices` |
| 162 | +|`std::vector<device>` |
| 163 | +|Returns the set of component devices that are contained by a composite device. |
| 164 | + If this device is not a composite device, an empty vector is returned. |
| 165 | + |
| 166 | +|`composite_device` |
| 167 | +|`device` |
| 168 | +|This query may only be applied to a device that has |
| 169 | + `aspect::ext_oneapi_is_component`. Returns the composite device which |
| 170 | + contains this component device. Since the set of composite devices is fixed, |
| 171 | + the `device` object returned from this query is a copy of one of the `device` |
| 172 | + objects returned by `get_composite_devices`. |
| 173 | + |
| 174 | +If this device does not have `aspect::ext_oneapi_is_component`, the |
| 175 | +`device::get_info` function throws a synchronous `exception` with the |
| 176 | +`errc::invalid` error code. |
| 177 | +|=== |
| 178 | + |
| 179 | +=== New device aspects |
| 180 | + |
| 181 | +This extension adds two new device aspects: |
| 182 | + |
| 183 | +``` |
| 184 | +namespace sycl { |
| 185 | + |
| 186 | +enum class aspect : /*unspecified*/ { |
| 187 | + ext_oneapi_is_composite, |
| 188 | + ext_oneapi_is_component |
| 189 | +}; |
| 190 | + |
| 191 | +} // namespace sycl |
| 192 | +``` |
| 193 | + |
| 194 | +[width="100%",%header,cols="50%,50%"] |
| 195 | +|=== |
| 196 | +|Aspect |
| 197 | +|Description |
| 198 | + |
| 199 | +|`ext_oneapi_is_composite` |
| 200 | +|Indicates that the device is a composite device. Any device with this aspect |
| 201 | + will have at least two constituent component devices. |
| 202 | + |
| 203 | +|`ext_oneapi_is_component` |
| 204 | +|Indicates that the device is a component device of some other composite |
| 205 | + device. This applies only to a root device that is a direct component of |
| 206 | + some composite device. A sub-device will not have this aspect even if its |
| 207 | + parent is a component device. |
| 208 | +|=== |
| 209 | + |
| 210 | +=== Impact on "descendent device" |
| 211 | + |
| 212 | +This extension augments the definition of the term "descendent device" from the |
| 213 | +core SYCL specification as follows. Given some composite device _C_ and some |
| 214 | +component of that composite device _R_, device _R_ is a descendent device of |
| 215 | +_C_ and all descendent devices of _R_ are also descendent devices of _C_. |
| 216 | + |
| 217 | +This definition means that a SYCL context that contains a composite device is |
| 218 | +compatible with any of its component devices, even if those component devices |
| 219 | +are not contained by the context. See the core SYCL specification for details. |
| 220 | + |
| 221 | + |
| 222 | +== Impact to the ONEAPI_DEVICE_SELECTOR |
| 223 | + |
| 224 | +The `ONEAPI_DEVICE_SELECTOR` is an environment variable that is specific to the |
| 225 | +{dpcpp} implementation. Therefore, this section that describes the interaction |
| 226 | +between this extension and that environment variable is non-normative and does |
| 227 | +not apply to other SYCL implementations that may support this extension. |
| 228 | + |
| 229 | +The `ONEAPI_DEVICE_SELECTOR` environment variable determines the list of root |
| 230 | +devices that are returned from `device::get_devices`: _R0_, _R1_, _R2_, etc. |
| 231 | +The devices returned from `get_composite_devices` are computed from this list |
| 232 | +by iterating over the elements _Ri_ in order: |
| 233 | + |
| 234 | +* Start with an empty list of composite devices. |
| 235 | +* If device _Ri_ is a component of some composite device _C_ and if all other |
| 236 | + components of _C_ are also in the list of root devices returned from |
| 237 | + `device::get_devices`, then _C_ is appended to the list of composite devices |
| 238 | + unless _C_ is already in that list. |
| 239 | + |
| 240 | +This algorithm ensures that a composite device is made visible to the |
| 241 | +application only if all of its components are also visible. This is important |
| 242 | +for two reasons. The first reason is philosophical. We do not want to expose |
| 243 | +a composite device unless it has at least two component devices. This is |
| 244 | +similar to our decision in the core SYCL spec to disallow partitioning into |
| 245 | +sub-devices which results in only a single sub-device. We don't want to expose |
| 246 | +partitioning when the parent and child represent exactly the same hardware. |
| 247 | +The second reason is practical. The {dpcpp} implementation will associate |
| 248 | +each composite device with a single Level Zero native device handle, which |
| 249 | +represents the entire card. There is no way to get a Level Zero handle that |
| 250 | +represents a subset of the tiles. Therefore, we do not expose a composite |
| 251 | +device unless it represents all of the tiles on a card. |
| 252 | + |
| 253 | +The algorithm also imposes a sensible order on the composite devices, which |
| 254 | +corresponds to the order of the root devices. Thus, users who set the order of |
| 255 | +root devices via `ONEAPI_DEVICE_SELECTOR` will also get a corresponding order |
| 256 | +for the composite devices. |
| 257 | + |
| 258 | + |
| 259 | +== Issues |
| 260 | + |
| 261 | +* What is the impact of this extension on |
| 262 | + link:../supported/sycl_ext_oneapi_default_context.asciidoc[ |
| 263 | + sycl_ext_oneapi_default_context]? I think there are two reasonable options. |
| 264 | + One option is to say that the default context is not affected by this |
| 265 | + extension. As a result, applications that use composite devices will need to |
| 266 | + create their own context which contains the set of composite devices that |
| 267 | + they care about. The other option is to change the default context to |
| 268 | + include all of the composite devices. This would be easier to use because |
| 269 | + the default context continues to reflect all devices in the system. As a |
| 270 | + result, the default context can be used in all scenarios, and an application |
| 271 | + only needs to create a custom one in order to gain some optimization. It's |
| 272 | + not clear at this point if redefining the default context like this would |
| 273 | + result in a performance penalty for applications that don't use composite |
| 274 | + devices. |
| 275 | + |
| 276 | +* What guarantees do we make (if any) about the accessibility of USM allocated |
| 277 | + on a composite device? Do we guarantee that this memory is also accessible |
| 278 | + on the component devices? Is the answer different for "device" USM vs. |
| 279 | + "shared" USM? One option is to say that the application needs to |
| 280 | + specifically enable P2P access between a composite device and its component |
| 281 | + devices in order to guarantee accessibility. There is a related question in |
| 282 | + the core SYCL spec about accessibility of USM w.r.t. sub-devices, which we |
| 283 | + have not yet resolved. |
| 284 | + |
| 285 | +* Should the `ONEAPI_DEVICE_SELECTOR` be extended to allow selection of |
| 286 | + composite devices? For example, syntax like |
| 287 | + `ONEAPI_DEVICE_SELECTOR=level_zero:C1` could mean "select the second Level |
| 288 | + Zero card device and expose it as a root device from |
| 289 | + ``device::get_devices``". Our current thinking is that we should not allow |
| 290 | + syntax like this. |
0 commit comments