Skip to content

Commit 4a39b68

Browse files
author
iclsrc
committed
Merge from 'sycl' to 'sycl-web'
2 parents 1722084 + 3fe77b9 commit 4a39b68

22 files changed

+216
-139
lines changed

clang/lib/Driver/ToolChains/Clang.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5257,7 +5257,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
52575257
options::OPT_fno_sycl_early_optimizations,
52585258
!IsFPGASYCLOffloadDevice))
52595259
CmdArgs.push_back("-fno-sycl-early-optimizations");
5260-
else if (RawTriple.isSPIR() || IsSYCLNativeCPU) {
5260+
else if (RawTriple.isSPIR()) {
52615261
// Set `sycl-opt` option to configure LLVM passes for SPIR target
52625262
CmdArgs.push_back("-mllvm");
52635263
CmdArgs.push_back("-sycl-opt");

clang/test/Driver/sycl-native-cpu.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,8 @@
1717
// CHECK-WIN: {{.*}}"-fsycl-is-device"{{.*}}"-gcodeview"
1818
// CHECK-WIN-DAG: {{.*}}"-fsycl-is-host"{{.*}}"-gcodeview"
1919
// CHECK-WIN-NOT: dwarf
20+
21+
// checks that -sycl-opt is not enabled by default on NativeCPU so that the full llvm optimization is enabled
22+
// RUN: %clang -fsycl -fsycl-targets=native_cpu -### %s 2>&1 | FileCheck -check-prefix=CHECK-OPTS %s
23+
// CHECK-OPTS-NOT: -sycl-opt
24+

sycl/doc/EnvironmentVariables.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ compiler and runtime.
99
| -------------------- | ------ | ----------- |
1010
| `ONEAPI_DEVICE_SELECTOR` | [See below.](#oneapi_device_selector) | This device selection environment variable can be used to limit the choice of devices available when the SYCL-using application is run. Useful for limiting devices to a certain type (like GPUs or accelerators) or backends (like Level Zero or OpenCL). This device selection mechanism is replacing `SYCL_DEVICE_FILTER` . The `ONEAPI_DEVICE_SELECTOR` syntax is shared with OpenMP and also allows sub-devices to be chosen. [See below.](#oneapi_device_selector) for a full description. |
1111
| `SYCL_DEVICE_FILTER` (deprecated) | `backend:device_type:device_num` | Please use `ONEAPI_DEVICE_SELECTOR` environment variable instead. See section [`SYCL_DEVICE_FILTER`](#sycl_device_filter) below for `SYCL_DEVICE_FILTER` description. |
12-
| `SYCL_DEVICE_ALLOWLIST` | See [below](#sycl_device_allowlist) | Filter out devices that do not match the pattern specified. `BackendName` accepts `host`, `opencl`, `level_zero` or `cuda`. `DeviceType` accepts `host`, `cpu`, `gpu` or `acc`. `DeviceVendorId` accepts uint32_t in hex form (`0xXYZW`). `DriverVersion`, `PlatformVersion`, `DeviceName` and `PlatformName` accept regular expression. Special characters, such as parenthesis, must be escaped. DPC++ runtime will select only those devices which satisfy provided values above and regex. More than one device can be specified using the piping symbol "\|".|
12+
| `SYCL_DEVICE_ALLOWLIST` | See [below](#sycl_device_allowlist) | Filter out devices that do not match the pattern specified. `BackendName` accepts `host`, `opencl`, `level_zero`, `native_cpu` or `cuda`. `DeviceType` accepts `host`, `cpu`, `gpu` or `acc`. `DeviceVendorId` accepts uint32_t in hex form (`0xXYZW`). `DriverVersion`, `PlatformVersion`, `DeviceName` and `PlatformName` accept regular expression. Special characters, such as parenthesis, must be escaped. DPC++ runtime will select only those devices which satisfy provided values above and regex. More than one device can be specified using the piping symbol "\|".|
1313
| `SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING` | Any(\*) | Disables automatic rounding-up of `parallel_for` invocation ranges. |
1414
| `SYCL_CACHE_DIR` | Path | Path to persistent cache root directory. Default values are `%AppData%\libsycl_cache` for Windows and `$XDG_CACHE_HOME/libsycl_cache` on Linux, if `XDG_CACHE_HOME` is not set then `$HOME/.cache/libsycl_cache`. When none of the environment variables are set SYCL persistent cache is disabled. |
1515
| `SYCL_CACHE_DISABLE_PERSISTENT (deprecated)` | Any(\*) | Has no effect. |
@@ -42,7 +42,7 @@ ONEAPI_DEVICE_SELECTOR = <selector-string>
4242
<accept-filter> ::= <term>
4343
<discard-filter> ::= !<term>
4444
<term> ::= <backend>:<devices>
45-
<backend> ::= { * | level_zero | opencl | cuda | hip } // case insensitive
45+
<backend> ::= { * | level_zero | opencl | cuda | hip | native_cpu } // case insensitive
4646
<devices> ::= <device>[,<device>...]
4747
<device> ::= { * | cpu | gpu | fpga | <num> | <num>.<num> | <num>.* | *.* | <num>.<num>.<num> | <num>.<num>.* | <num>.*.* | *.*.* } // case insensitive
4848
```
@@ -58,13 +58,13 @@ The device indices are zero-based and are unique only within a backend. Therefor
5858

5959
Additionally, if a sub-device is chosen (via numeric index or wildcard), then an additional layer of partitioning can be specified. In other words, a sub-sub-device can be selected. Like sub-devices, this is done with a period ( `.` ) and a sub-sub-device specifier which is a wildcard symbol ( `*` ) or a numeric index. Example `ONEAPI_DEVICE_SELECTOR=level_zero:0.*.*` would partition device 0 into sub-devices and then partition each of those into sub-sub-devices. The range of grandchild sub-sub-devices would be the final devices available to the app, neither device 0, nor its child partitions would be in that list.
6060

61-
Lastly, a filter in the grammar can be thought of as a term in conjuction with an action that is taken on all devices that are selected by the term. The action can be an accept action or a discard action. Based on the action, a filter can be an accept filter or a discard filter.
61+
Lastly, a filter in the grammar can be thought of as a term in conjunction with an action that is taken on all devices that are selected by the term. The action can be an accept action or a discard action. Based on the action, a filter can be an accept filter or a discard filter.
6262
The string `<term>` represents an accept filter and the string `!<term>` represents a discard filter. The underlying term is the same but they perform different actions on the matching devices list.
6363
For example, `!opencl:*` discards all devices of the opencl backend from the list of available devices. The discarding filters, if there are any, must all appear at the end of the selector string.
6464
When one or more filters accept a device and one or more filters discard the device, the latter have priority and the device is ultimately not made available to the user. This allows the user to provide selector strings such as `*:gpu;!cuda:*` that accepts all gpu devices except those with a CUDA backend.
6565
Furthermore, if the value of this environment variable only has discarding filters, an accepting filter that matches all devices, but not sub-devices and sub-sub-devices, will be implicitly included in the
6666
environment variable to allow the user to specify only the list of devices that must not be made available. Therefore, `!*:cpu` will accept all devices except those that are of the cpu type and `opencl:*;!*:cpu`
67-
will accept all devices of the opencl backend exept those that are of the opencl backend and of the cpu type. It is legal to have a rejection filter even if it specifies devices have already been omitted by previous filters in the selection string. Doing so has no effect; the rejected devices are still omitted.
67+
will accept all devices of the opencl backend except those that are of the opencl backend and of the cpu type. It is legal to have a rejection filter even if it specifies devices have already been omitted by previous filters in the selection string. Doing so has no effect; the rejected devices are still omitted.
6868

6969
The following examples further illustrate the usage of this environment variable:
7070

@@ -290,7 +290,7 @@ variables in production code.</span>
290290
| `SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING` (Deprecated) | Integer | When set to non-zero value exposes compute slices as sub-sub-devices in `sycl::info::partition_property::partition_by_affinity_domain` partitioning scheme. Default is zero meaning that they are only exposed when partitioning by `sycl::info::partition_property::ext_intel_partition_by_cslice`. This option is introduced for compatibility reasons and is immediately deprecated. New code must not rely on this behavior. Also note that even if sub-sub-device was created using `partition_by_affinity_domain` it would still be reported as created via partitioning by compute slices. |
291291
| `SYCL_PI_LEVEL_ZERO_COMMANDLISTS_CLEANUP_THRESHOLD` | Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of command lists in a queue exceeds this threshold, an attempt is made to cleanup completed command lists for their subsequent reuse. The default is 20. |
292292
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of events associated with an immediate command list exceeds this threshold, a check is made for signaled events and these events are recycled. Setting this threshold low causes events to be checked more often, which could result in unneeded events being recycled sooner. However, more frequent event status checks may cost time. The default is 1000. |
293-
| `SYCL_PI_LEVEL_ZERO_USM_RESIDENT` | Integer | Bit-mask controls if/where to make USM allocations resident at the time of allocation. Input value is of the form 0xHSD, where 4-bits of D control device allocations, 4-bits of S control shared allocations, and 4-bits of H control host allocations. Each 4-bit componenet is holding one of the following values: "0" - then no special residency is forced, "1" - then allocation is made resident at the device of allocation, or "2" - then allocation is made resident on all devices in the context of allocation that have P2P access to the device of allocation. Default is 0x002, i.e. force full residency for device allocations only. |
293+
| `SYCL_PI_LEVEL_ZERO_USM_RESIDENT` | Integer | Bit-mask controls if/where to make USM allocations resident at the time of allocation. Input value is of the form 0xHSD, where 4-bits of D control device allocations, 4-bits of S control shared allocations, and 4-bits of H control host allocations. Each 4-bit component is holding one of the following values: "0" - then no special residency is forced, "1" - then allocation is made resident at the device of allocation, or "2" - then allocation is made resident on all devices in the context of allocation that have P2P access to the device of allocation. Default is 0x002, i.e. force full residency for device allocations only. |
294294
| `SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D` | Integer | When set to a positive value enables the use of Level Zero USM 2D memory copy operations. Default is 0. |
295295

296296
## Debugging variables for CUDA Plugin

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -915,9 +915,11 @@ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
915915
`matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
916916
`architecture::intel_gpu_pvc`|8| `architecture::intel_gpu_dg2_g10,
917917
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
918-
.2+| `matrix_type::bf16` .2+| `matrix_type::bf16` .2+|
919-
`matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
920-
`architecture::intel_gpu_pvc` |8| `architecture::intel_gpu_dg2_g10,
918+
.4+| `matrix_type::bf16` .4+| `matrix_type::bf16` .4+|
919+
`matrix_type::fp32` | 16 | 16 | 16 .3+|`architecture::intel_gpu_pvc` |
920+
32 | 64 | 16
921+
.2+| +<=+ 8 | 16 .2+| 16
922+
|8 | `architecture::intel_gpu_dg2_g10,
921923
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
922924
| `matrix_type::tf32` | `matrix_type::tf32` |
923925
`matrix_type::fp32` | +<=+ 8 | 16 | 8 |

sycl/include/sycl/accessor.hpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -982,9 +982,9 @@ class image_accessor
982982

983983
device Device = getDeviceFromHandler(CommandGroupHandlerRef);
984984
if (!Device.has(aspect::ext_intel_legacy_image))
985-
throw feature_not_supported(
986-
"SYCL 1.2.1 images are not supported by this device.",
987-
PI_ERROR_INVALID_OPERATION);
985+
throw sycl::exception(
986+
sycl::errc::feature_not_supported,
987+
"SYCL 1.2.1 images are not supported by this device.");
988988
}
989989
#endif
990990

sycl/include/sycl/ext/oneapi/group_local_memory.hpp

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
#include <sycl/detail/defines_elementary.hpp> // for __SYCL_ALWAYS_INLINE
1212
#include <sycl/detail/pi.h> // for PI_ERROR_INVALID_OPERA...
1313
#include <sycl/detail/type_traits.hpp> // for is_group
14-
#include <sycl/exception.hpp> // for feature_not_supported
14+
#include <sycl/exception.hpp> // for exception
1515
#include <sycl/ext/intel/usm_pointers.hpp> // for multi_ptr
1616

1717
#include <type_traits> // for enable_if_t
@@ -42,9 +42,9 @@ std::enable_if_t<
4242
}
4343
return reinterpret_cast<__attribute__((opencl_local)) T *>(AllocatedMem);
4444
#else
45-
throw feature_not_supported(
46-
"sycl_ext_oneapi_local_memory extension is not supported on host device",
47-
PI_ERROR_INVALID_OPERATION);
45+
throw sycl::exception(
46+
sycl::errc::feature_not_supported,
47+
"sycl_ext_oneapi_local_memory extension is not supported on host");
4848
#endif
4949
}
5050

@@ -64,9 +64,9 @@ std::enable_if_t<
6464
// Silence unused variable warning
6565
(void)g;
6666
[&args...] {}();
67-
throw feature_not_supported(
68-
"sycl_ext_oneapi_local_memory extension is not supported on host device",
69-
PI_ERROR_INVALID_OPERATION);
67+
throw sycl::exception(
68+
sycl::errc::feature_not_supported,
69+
"sycl_ext_oneapi_local_memory extension is not supported on host");
7070
#endif
7171
}
7272
} // namespace ext::oneapi

sycl/include/sycl/handler.hpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1261,9 +1261,10 @@ class __SYCL_EXPORT handler {
12611261
"first argument of sycl::item type, or of a type which is "
12621262
"implicitly convertible from sycl::item");
12631263

1264+
using RefLambdaArgType = std::add_lvalue_reference_t<LambdaArgType>;
12641265
static_assert(
1265-
(std::is_invocable_v<KernelType, LambdaArgType> ||
1266-
std::is_invocable_v<KernelType, LambdaArgType, kernel_handler>),
1266+
(std::is_invocable_v<KernelType, RefLambdaArgType> ||
1267+
std::is_invocable_v<KernelType, RefLambdaArgType, kernel_handler>),
12671268
"SYCL kernel lambda/functor has an unexpected signature, it should be "
12681269
"invocable with sycl::item and optionally sycl::kernel_handler");
12691270
#endif

sycl/source/detail/device_info.hpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -763,6 +763,10 @@ struct get_device_info_impl<
763763
matrix_type::fp32, matrix_type::fp32},
764764
{8, 0, 0, 0, 16, 16, matrix_type::bf16, matrix_type::bf16,
765765
matrix_type::fp32, matrix_type::fp32},
766+
{0, 0, 0, 16, 16, 16, matrix_type::bf16, matrix_type::bf16,
767+
matrix_type::fp32, matrix_type::fp32},
768+
{0, 0, 0, 32, 64, 16, matrix_type::bf16, matrix_type::bf16,
769+
matrix_type::fp32, matrix_type::fp32},
766770
{8, 0, 0, 0, 16, 8, matrix_type::tf32, matrix_type::tf32,
767771
matrix_type::fp32, matrix_type::fp32},
768772
};

sycl/source/detail/event_impl.cpp

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -464,11 +464,13 @@ void event_impl::setSubmissionTime() {
464464
if (QueueImplPtr Queue = MQueue.lock()) {
465465
try {
466466
MSubmitTime = Queue->getDeviceImplPtr()->getCurrentDeviceTime();
467-
} catch (feature_not_supported &e) {
468-
throw sycl::exception(
469-
make_error_code(errc::profiling),
470-
std::string("Unable to get command group submission time: ") +
471-
e.what());
467+
} catch (sycl::exception &e) {
468+
if (e.code() == sycl::errc::feature_not_supported)
469+
throw sycl::exception(
470+
make_error_code(errc::profiling),
471+
std::string("Unable to get command group submission time: ") +
472+
e.what());
473+
std::rethrow_exception(std::current_exception());
472474
}
473475
}
474476
} else {

sycl/source/detail/program_impl.cpp

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,10 @@ program_impl::program_impl(ContextImplPtr Context,
3636
const property_list &PropList)
3737
: MContext(Context), MDevices(DeviceList), MPropList(PropList) {
3838
if (Context->getDevices().size() > 1) {
39-
throw feature_not_supported(
39+
throw sycl::exception(
40+
sycl::errc::feature_not_supported,
4041
"multiple devices within a context are not supported with "
41-
"sycl::program and sycl::kernel",
42-
PI_ERROR_INVALID_OPERATION);
42+
"sycl::program and sycl::kernel");
4343
}
4444
}
4545

@@ -65,10 +65,10 @@ program_impl::program_impl(
6565

6666
MContext = ProgramList[0]->MContext;
6767
if (MContext->getDevices().size() > 1) {
68-
throw feature_not_supported(
68+
throw sycl::exception(
69+
sycl::errc::feature_not_supported,
6970
"multiple devices within a context are not supported with "
70-
"sycl::program and sycl::kernel",
71-
PI_ERROR_INVALID_OPERATION);
71+
"sycl::program and sycl::kernel");
7272
}
7373
MDevices = ProgramList[0]->MDevices;
7474
std::vector<device> DevicesSorted;

0 commit comments

Comments
 (0)