Skip to content

Commit a4d1cad

Browse files
author
iclsrc
committed
Merge from 'sycl' to 'sycl-web'
2 parents d88c086 + 2fdf940 commit a4d1cad

File tree

18 files changed

+501
-154
lines changed

18 files changed

+501
-154
lines changed

llvm/tools/sycl-post-link/sycl-post-link.cpp

Lines changed: 144 additions & 144 deletions
Large diffs are not rendered by default.
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# SYCL_EXT_ONEAPI_MAX_WORK_GROUP_QUERY
2+
3+
## Notice
4+
5+
This document describes an **experimental** API that applications can use to try
6+
out a new feature. Future versions of this API may change in ways that are
7+
incompatible with this experimental version.
8+
9+
10+
## Introduction
11+
12+
This extension adds functionally two new device information descriptors. They provide the ability to query a device for the maximum numbers of work-groups that can be submitted in each dimension as well as globally (across all dimensions).
13+
14+
OpenCL never offered such query - which is probably why it is absent from SYCL. Now that SYCL supports back-ends where the maximum number of work-groups in each dimension can be different, having the ability to query that limit is crucial in writing safe and portable code.
15+
16+
## Feature test macro
17+
18+
As encouraged by the SYCL specification, a feature-test macro, `SYCL_EXT_ONEAPI_MAX_WORK_GROUP_QUERY`, is provided to determine whether this extension is implemented.
19+
20+
## New device descriptors
21+
22+
| Device descriptors | Return type | Description |
23+
| ------------------------------------------------------ | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
24+
| info::device::ext_oneapi_max_work_groups_1d |  id<1> | Returns the maximum number of work-groups that can be submitted in each dimension of the `globalSize` of a `nd_range<1>`. The minimum value is `(1)` if the device is different than `info::device_type::custom`. |
25+
| info::device::ext_oneapi_max_work_groups_2d |  id<2> | Returns the maximum number of work-groups that can be submitted in each dimension of the `globalSize` of a `nd_range<2>`. The minimum value is `(1, 1)` if the device is different than `info::device_type::custom`. |
26+
| info::device::ext_oneapi_max_work_groups_3d |  id<3> | Returns the maximum number of work-groups that can be submitted in each dimension of the `globalSize` of a `nd_range<3>`. The minimum value is `(1, 1, 1)` if the device is different than `info::device_type::custom`. |
27+
| info::device::ext_oneapi_max_global_work_groups |  size_t | Returns the maximum number of work-groups that can be submitted across all the dimensions. The minimum value is `1`. |
28+
29+
### Note
30+
31+
- The returned values have the same ordering as the `nd_range` arguments.
32+
- The implementation does not guarantee that the user could select all the maximum numbers returned by `ext_oneapi_max_work_groups` at the same time. Thus the user should also check that the selected number of work-groups across all dimensions is smaller than the maximum global number returned by `ext_oneapi_max_global_work_groups`.
33+
34+
## Examples
35+
36+
```c++
37+
sycl::device gpu = sycl::device{sycl::gpu_selector{}};
38+
std::cout << gpu.get_info<sycl::info::device::name>() << '\n';
39+
40+
#ifdef SYCL_EXT_ONEAPI_MAX_WORK_GROUP_QUERY
41+
sycl::id<3> groups = gpu.get_info<sycl::info::device::ext_oneapi_max_work_groups_3d>();
42+
size_t global_groups = gpu.get_info<sycl::info::device::ext_oneapi_max_global_work_groups>();
43+
std::cout << "Max number groups: x_max: " << groups[2] << " y_max: " << groups[1] << " z_max: " << groups[0] << '\n';
44+
std::cout << "Max global number groups: " << global_groups << '\n';
45+
#endif
46+
```
47+
48+
Ouputs to the console:
49+
50+
```
51+
NVIDIA ...
52+
Max number groups: x_max: 2147483647 y_max: 65535 z_max: 65535
53+
Max global number groups: 2147483647
54+
```
55+
56+
See: [CUDA Toolkit Documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities)
57+
58+
Then the following assertions should be satisfied at kernel submission:
59+
60+
```C++
61+
sycl::nd_range<3> work_range(global_size, local_size);
62+
63+
assert(global_size[2] <= groups[2]
64+
&& global_size[1] <= groups[1]
65+
&& global_size[0] <= groups[0]);
66+
67+
assert(global_size[2] * global_size[1] * global_size[0] <= global_groups); //Make sure not to exceed integer representation size in the multiplication.
68+
69+
gpu_queue.submit(work_range, ...);
70+
```
71+
72+
## Implementation
73+
74+
### Templated queries
75+
76+
Right now, DPC++ does not support templated device descriptors as they are defined in the SYCL specification section 4.6.4.2 "Device information descriptors". When the implementation supports this syntax, `ext_oneapi_max_work_groups_[1,2,3]d` should be replaced by the templated syntax: `ext_oneapi_max_work_groups<[1,2,3]>`.
77+
### Consistency with existing checks
78+
79+
The implementation already checks when enqueuing a kernel that the global and per dimension work-group number is smaller than `std::numeric_limits<int>::max`. This check is implemented in `sycl/include/CL/sycl/handler.hpp`. For consistency, values returned by the two device descriptors are bound by this limit.
80+
81+
### Example of returned values
82+
83+
- If the device is the host or has an OpenCL back-end, the values returned - as they are not applicable - are the maximum values accepted at kernel submission (see `sycl/include/CL/sycl/handler.hpp`) which are currently `std::numeric_limits<int>::max`.
84+
- CUDA: Back-end query using `CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_[X,Y,Z]`.

sycl/doc/extensions/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,10 @@ DPC++ extensions status:
4141
| [SYCL_INTEL_group_sort](GroupAlgorithms/SYCL_INTEL_group_sort.asciidoc) | Proposal | |
4242
| [Invoke SIMD](InvokeSIMD/InvokeSIMD.asciidoc) | Proposal | |
4343
| [Uniform](Uniform/Uniform.asciidoc) | Proposal | |
44-
| [Assert](Assert/SYCL_ONEAPI_ASSERT.asciidoc) | Proposal | |
45-
| [Matrix](Matrix/dpcpp-joint-matrix.asciidoc) | Partially supported(AMX AOT) | Not supported: dynamic-extent, wg and wi scopes, layouts other than packed|
44+
| [Assert](Assert/SYCL_ONEAPI_ASSERT.asciidoc) | Proposal | |
45+
| [Matrix](Matrix/dpcpp-joint-matrix.asciidoc) | Partially supported(AMX AOT) | Not supported: dynamic-extent, wg and wi scopes, layouts other than packed|
4646
| [SYCL_INTEL_free_function_queries](FreeFunctionQueries/SYCL_INTEL_free_function_queries.asciidoc) | Supported (experimental) | |
47+
| [EXT_ONEAPI_max_work_groups](MaxWorkGroupQueries/max_work_group_query.md) | Supported | |
4748
| [SYCL_EXT_ONEAPI_DEVICE_GLOBAL](DeviceGlobal/SYCL_INTEL_device_global.asciidoc) | Proposal | |
4849
| [SYCL_INTEL_bf16_conversion](Bf16Conversion/SYCL_INTEL_bf16_conversion.asciidoc) | Partially supported (Level Zero: GPU) | Currently available only on Xe HP GPU. ext_intel_bf16_conversion aspect is not supported. |
4950
| [Property List](PropertyList/SYCL_EXT_ONEAPI_property_list.asciidoc) | Proposal | |

sycl/include/CL/sycl/detail/pi.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,12 @@ typedef enum {
300300
PI_DEVICE_INFO_MAX_MEM_BANDWIDTH = 0x10026,
301301
PI_DEVICE_INFO_IMAGE_SRGB = 0x10027,
302302
PI_DEVICE_INFO_ATOMIC_64 = 0x10110,
303-
PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES = 0x10111
303+
PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES = 0x10111,
304+
PI_EXT_ONEAPI_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS = 0x20000,
305+
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_1D = 0x20001,
306+
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_2D = 0x20002,
307+
PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D = 0x20003
308+
304309
} _pi_device_info;
305310

306311
typedef enum {

sycl/include/CL/sycl/feature_test.hpp.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ namespace sycl {
4242
#if SYCL_BUILD_PI_HIP
4343
#define SYCL_EXT_ONEAPI_BACKEND_HIP 1
4444
#endif
45+
#define SYCL_EXT_ONEAPI_MAX_WORK_GROUP_QUERY 1
4546

4647
} // namespace sycl
4748
} // __SYCL_INLINE_NAMESPACE(cl)

sycl/include/CL/sycl/info/device_traits.def

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,3 +98,7 @@ __SYCL_PARAM_TRAITS_SPEC(device, ext_intel_max_mem_bandwidth, pi_uint64)
9898
__SYCL_PARAM_TRAITS_SPEC(device, ext_intel_mem_channel, bool)
9999
__SYCL_PARAM_TRAITS_SPEC(device, ext_oneapi_srgb, bool)
100100
__SYCL_PARAM_TRAITS_SPEC(device, ext_intel_device_info_uuid, detail::uuid_type)
101+
__SYCL_PARAM_TRAITS_SPEC(device, ext_oneapi_max_global_work_groups, size_t)
102+
__SYCL_PARAM_TRAITS_SPEC(device, ext_oneapi_max_work_groups_1d, id<1>)
103+
__SYCL_PARAM_TRAITS_SPEC(device, ext_oneapi_max_work_groups_2d, id<2>)
104+
__SYCL_PARAM_TRAITS_SPEC(device, ext_oneapi_max_work_groups_3d, id<3>)

sycl/include/CL/sycl/info/info_desc.hpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,11 @@ enum class device : cl_device_info {
160160
atomic64 = PI_DEVICE_INFO_ATOMIC_64,
161161
atomic_memory_order_capabilities =
162162
PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES,
163+
ext_oneapi_max_global_work_groups =
164+
PI_EXT_ONEAPI_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS,
165+
ext_oneapi_max_work_groups_1d = PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_1D,
166+
ext_oneapi_max_work_groups_2d = PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_2D,
167+
ext_oneapi_max_work_groups_3d = PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D
163168
};
164169

165170
enum class device_type : pi_uint64 {

sycl/include/sycl/ext/oneapi/sub_group_mask.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ struct sub_group_mask {
142142

143143
template <typename Type, size_t Size,
144144
typename = sycl::detail::enable_if_t<std::is_integral<Type>::value>>
145-
void extract_bits(marray<Type, Size> &bits, id<1> pos = 0) {
145+
void extract_bits(marray<Type, Size> &bits, id<1> pos = 0) const {
146146
size_t cur_pos = pos.get(0);
147147
for (auto &elem : bits) {
148148
if (cur_pos < size()) {

sycl/plugins/cuda/pi_cuda.cpp

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -991,6 +991,32 @@ pi_result cuda_piDeviceGetInfo(pi_device device, pi_device_info param_name,
991991
return getInfoArray(max_work_item_dimensions, param_value_size, param_value,
992992
param_value_size_ret, return_sizes);
993993
}
994+
995+
case PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D: {
996+
size_t return_sizes[max_work_item_dimensions];
997+
int max_x = 0, max_y = 0, max_z = 0;
998+
cl::sycl::detail::pi::assertion(
999+
cuDeviceGetAttribute(&max_x, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X,
1000+
device->get()) == CUDA_SUCCESS);
1001+
cl::sycl::detail::pi::assertion(max_x >= 0);
1002+
1003+
cl::sycl::detail::pi::assertion(
1004+
cuDeviceGetAttribute(&max_y, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y,
1005+
device->get()) == CUDA_SUCCESS);
1006+
cl::sycl::detail::pi::assertion(max_y >= 0);
1007+
1008+
cl::sycl::detail::pi::assertion(
1009+
cuDeviceGetAttribute(&max_z, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z,
1010+
device->get()) == CUDA_SUCCESS);
1011+
cl::sycl::detail::pi::assertion(max_z >= 0);
1012+
1013+
return_sizes[0] = size_t(max_x);
1014+
return_sizes[1] = size_t(max_y);
1015+
return_sizes[2] = size_t(max_z);
1016+
return getInfoArray(max_work_item_dimensions, param_value_size, param_value,
1017+
param_value_size_ret, return_sizes);
1018+
}
1019+
9941020
case PI_DEVICE_INFO_MAX_WORK_GROUP_SIZE: {
9951021
int max_work_group_size = 0;
9961022
cl::sycl::detail::pi::assertion(

sycl/plugins/hip/pi_hip.cpp

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -980,6 +980,32 @@ pi_result hip_piDeviceGetInfo(pi_device device, pi_device_info param_name,
980980
return getInfoArray(max_work_item_dimensions, param_value_size, param_value,
981981
param_value_size_ret, return_sizes);
982982
}
983+
984+
case PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_3D: {
985+
size_t return_sizes[max_work_item_dimensions];
986+
int max_x = 0, max_y = 0, max_z = 0;
987+
cl::sycl::detail::pi::assertion(
988+
hipDeviceGetAttribute(&max_x, hipDeviceAttributeMaxGridDimX,
989+
device->get()) == hipSuccess);
990+
cl::sycl::detail::pi::assertion(max_x >= 0);
991+
992+
cl::sycl::detail::pi::assertion(
993+
hipDeviceGetAttribute(&max_y, hipDeviceAttributeMaxGridDimY,
994+
device->get()) == hipSuccess);
995+
cl::sycl::detail::pi::assertion(max_y >= 0);
996+
997+
cl::sycl::detail::pi::assertion(
998+
hipDeviceGetAttribute(&max_z, hipDeviceAttributeMaxGridDimZ,
999+
device->get()) == hipSuccess);
1000+
cl::sycl::detail::pi::assertion(max_z >= 0);
1001+
1002+
return_sizes[0] = size_t(max_x);
1003+
return_sizes[1] = size_t(max_y);
1004+
return_sizes[2] = size_t(max_z);
1005+
return getInfoArray(max_work_item_dimensions, param_value_size, param_value,
1006+
param_value_size_ret, return_sizes);
1007+
}
1008+
9831009
case PI_DEVICE_INFO_MAX_WORK_GROUP_SIZE: {
9841010
int max_work_group_size = 0;
9851011
cl::sycl::detail::pi::assertion(

0 commit comments

Comments
 (0)