Skip to content

Commit 3323da6

Browse files
authored
[SYCL] Improve range reduction performance on CPU (#6164)
The performance improvement is the result of two complementary changes: Using an alternative heuristic to select work-group size on the CPU. Keeping work-groups small simplifies combination of partial results and reduces the number of temporary variables. Adjusting the mapping of the range to an ND-range. Breaking the range into contiguous chunks that are assigned to each results in streaming patterns that are better-suited to prefetching hardware. Signed-off-by: John Pennycook john.pennycook@intel.com
1 parent 991e3fa commit 3323da6

File tree

9 files changed

+366
-22
lines changed

9 files changed

+366
-22
lines changed

sycl/doc/EnvironmentVariables.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ compiler and runtime.
2424
| `SYCL_RT_WARNING_LEVEL` | Positive integer | The higher warning level is used the more warnings and performance hints the runtime library may print. Default value is '0', which means no warning/hint messages from the runtime library are allowed. The value '1' enables performance warnings from device runtime/codegen. The values greater than 1 are reserved for future use. |
2525
| `SYCL_USM_HOSTPTR_IMPORT` | Integer | Enable by specifying non-zero value. Buffers created with a host pointer will result in host data promotion to USM, improving data transfer performance. To use this feature, also set SYCL_HOST_UNIFIED_MEMORY=1. |
2626
| `SYCL_EAGER_INIT` | Integer | Enable by specifying non-zero value. Tells the SYCL runtime to do as much as possible initialization at objects construction as opposed to doing lazy initialization on the fly. This may mean doing some redundant work at warmup but ensures fastest possible execution on the following hot and reportable paths. It also instructs PI plugins to do the same. Default is "0". |
27+
| `SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE` | See [below](#sycl_reduction_preferred_workgroup_size) | Controls the preferred work-group size of reductions. |
2728

2829
`(*) Note: Any means this environment variable is effective when set to any non-null value.`
2930

@@ -60,6 +61,32 @@ Assuming a filter has all three elements of the triple, it selects only those de
6061

6162
Note that all device selectors will throw an exception if the filtered list of devices does not include a device that satisfies the selector. For instance, `SYCL_DEVICE_FILTER=cpu,level_zero` will cause `host_selector()` to throw an exception. `SYCL_DEVICE_FILTER` also limits loading only specified plugins into the SYCL RT. In particular, `SYCL_DEVICE_FILTER=level_zero` will cause the `cpu_selector` to throw an exception since SYCL RT will only load the `level_zero` backend which does not support any CPU devices at this time. When multiple devices satisfy the filter (e..g, `SYCL_DEVICE_FILTER=gpu`), only one of them will be selected.
6263

64+
## `SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE`
65+
66+
This environment variable controls the preferred work-group size for reductions on specified device types. Setting this will affect all reductions without an explicitly specified work-group size on devices of types in the value of the environment variable.
67+
68+
The value of this environment variable is a comma separated list of one or more configurations, where each configuration is a pair of the form "`device_type`:`size`" (without the quotes). Possible values of `device_type` are:
69+
- `cpu`
70+
- `gpu`
71+
- `acc`
72+
- `*`
73+
74+
`size` is a positive integer larger than 0.
75+
76+
For a configuration `device_type`:`size` the `device_type` element specifies the type of device the configuration applies to, that is `cpu` is for CPU devices, `gpu` is for GPU devices, and `acc` is for accelerator devices. If `device_type` is `*` the configuration applies to all applicable device types. `size` denotes the preferred work-group size to be used for devices of types specified by `device_type`.
77+
78+
If `info::device::max_work_group_size` on a device on which a reduction is being enqueued is less than the value specified by a configuration in this environment variable, the value of `info::device::max_work_group_size` on that device is used instead.
79+
80+
A `sycl::exception` with `sycl::errc::invalid` is thrown during submission of a reduction kernel in the following cases:
81+
- If the specified device type in any configuration is not one of the valid values.
82+
- If the specified preferred work-group size in any configuration is not a valid integer.
83+
- If the specified preferred work-group size in any configuration is not an integer value larger than 0.
84+
- If any configuration does not have the `:` delimiter.
85+
86+
If this environment variable is not set, the preferred work-group size for reductions is implementation defined.
87+
88+
Note that conflicting configuration tuples in the same list will favor the last entry. For example, a list `cpu:32,gpu:32,cpu:16` will set the preferred work-group size of reductions to 32 for GPUs and 16 for CPUs. This also applies to `*`, for example `cpu:32,*:16` sets the preferred work-group size of reductions on all devices to 16, while `*:16,cpu:32` sets the preferred work-group size of reductions to 32 on CPUs and to 16 on all other devices.
89+
6390
## Controlling DPC++ Level Zero Plugin
6491

6592
| Environment variable | Values | Description |

sycl/include/sycl/ext/oneapi/reduction.hpp

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,8 @@ __SYCL_EXPORT size_t reduGetMaxWGSize(std::shared_ptr<queue_impl> Queue,
147147
size_t LocalMemBytesPerWorkItem);
148148
__SYCL_EXPORT size_t reduComputeWGSize(size_t NWorkItems, size_t MaxWGSize,
149149
size_t &NWorkGroups);
150+
__SYCL_EXPORT size_t reduGetPreferredWGSize(std::shared_ptr<queue_impl> &Queue,
151+
size_t LocalMemBytesPerWorkItem);
150152

151153
/// Class that is used to represent objects that are passed to user's lambda
152154
/// functions and representing users' reduction variable.
@@ -890,16 +892,28 @@ using __sycl_reduction_kernel =
890892
sycl::detail::auto_name, Namer<KernelName, Ts...>>;
891893

892894
/// Called in device code. This function iterates through the index space
893-
/// \p Range using stride equal to the global range specified in \p NdId,
895+
/// by assigning contiguous chunks to each work-group, then iterating
896+
/// through each chunk using a stride equal to the work-group's local range,
894897
/// which gives much better performance than using stride equal to 1.
895898
/// For each of the index the given \p F function/functor is called and
896899
/// the reduction value hold in \p Reducer is accumulated in those calls.
897900
template <typename KernelFunc, int Dims, typename ReducerT>
898-
void reductionLoop(const range<Dims> &Range, ReducerT &Reducer,
899-
const nd_item<1> &NdId, KernelFunc &F) {
900-
size_t Start = NdId.get_global_id(0);
901-
size_t End = Range.size();
902-
size_t Stride = NdId.get_global_range(0);
901+
void reductionLoop(const range<Dims> &Range, const size_t PerGroup,
902+
ReducerT &Reducer, const nd_item<1> &NdId, KernelFunc &F) {
903+
// Divide into contiguous chunks and assign each chunk to a Group
904+
// Rely on precomputed division to avoid repeating expensive operations
905+
// TODO: Some devices may prefer alternative remainder handling
906+
auto Group = NdId.get_group();
907+
size_t GroupId = Group.get_group_linear_id();
908+
size_t NumGroups = Group.get_group_linear_range();
909+
bool LastGroup = (GroupId == NumGroups - 1);
910+
size_t GroupStart = GroupId * PerGroup;
911+
size_t GroupEnd = LastGroup ? Range.size() : (GroupStart + PerGroup);
912+
913+
// Loop over the contiguous chunk
914+
size_t Start = GroupStart + NdId.get_local_id(0);
915+
size_t End = GroupEnd;
916+
size_t Stride = NdId.get_local_range(0);
903917
for (size_t I = Start; I < End; I += Stride)
904918
F(sycl::detail::getDelinearizedId(Range, I), Reducer);
905919
}
@@ -919,10 +933,12 @@ bool reduCGFuncForRangeFastAtomics(handler &CGH, KernelType KernelFunc,
919933
auto GroupSum = Reduction::getReadWriteLocalAcc(NElements, CGH);
920934
using Name = __sycl_reduction_kernel<reduction::main_krn::RangeFastAtomics,
921935
KernelName>;
936+
size_t NWorkGroups = NDRange.get_group_range().size();
937+
size_t PerGroup = Range.size() / NWorkGroups;
922938
CGH.parallel_for<Name>(NDRange, [=](nd_item<1> NDId) {
923939
// Call user's functions. Reducer.MValue gets initialized there.
924940
typename Reduction::reducer_type Reducer;
925-
reductionLoop(Range, Reducer, NDId, KernelFunc);
941+
reductionLoop(Range, PerGroup, Reducer, NDId, KernelFunc);
926942

927943
// Work-group cooperates to initialize multiple reduction variables
928944
auto LID = NDId.get_local_id(0);
@@ -987,10 +1003,11 @@ bool reduCGFuncForRangeFastReduce(handler &CGH, KernelType KernelFunc,
9871003

9881004
using Name =
9891005
__sycl_reduction_kernel<reduction::main_krn::RangeFastReduce, KernelName>;
1006+
size_t PerGroup = Range.size() / NWorkGroups;
9901007
CGH.parallel_for<Name>(NDRange, [=](nd_item<1> NDId) {
9911008
// Call user's functions. Reducer.MValue gets initialized there.
9921009
typename Reduction::reducer_type Reducer;
993-
reductionLoop(Range, Reducer, NDId, KernelFunc);
1010+
reductionLoop(Range, PerGroup, Reducer, NDId, KernelFunc);
9941011

9951012
typename Reduction::binary_operation BOp;
9961013
auto Group = NDId.get_group();
@@ -1081,10 +1098,11 @@ bool reduCGFuncForRangeBasic(handler &CGH, KernelType KernelFunc,
10811098
auto BOp = Redu.getBinaryOperation();
10821099
using Name =
10831100
__sycl_reduction_kernel<reduction::main_krn::RangeBasic, KernelName>;
1101+
size_t PerGroup = Range.size() / NWorkGroups;
10841102
CGH.parallel_for<Name>(NDRange, [=](nd_item<1> NDId) {
10851103
// Call user's functions. Reducer.MValue gets initialized there.
10861104
typename Reduction::reducer_type Reducer(Identity, BOp);
1087-
reductionLoop(Range, Reducer, NDId, KernelFunc);
1105+
reductionLoop(Range, PerGroup, Reducer, NDId, KernelFunc);
10881106

10891107
// If there are multiple values, reduce each separately
10901108
// This prevents local memory from scaling with elements

sycl/include/sycl/handler.hpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,9 @@ reduGetMaxNumConcurrentWorkGroups(std::shared_ptr<queue_impl> Queue);
303303
__SYCL_EXPORT size_t reduGetMaxWGSize(std::shared_ptr<queue_impl> Queue,
304304
size_t LocalMemBytesPerWorkItem);
305305

306+
__SYCL_EXPORT size_t reduGetPreferredWGSize(std::shared_ptr<queue_impl> &Queue,
307+
size_t LocalMemBytesPerWorkItem);
308+
306309
template <typename... ReductionT, size_t... Is>
307310
size_t reduGetMemPerWorkItem(std::tuple<ReductionT...> &ReduTuple,
308311
std::index_sequence<Is...>);
@@ -1618,13 +1621,13 @@ class __SYCL_EXPORT handler {
16181621
#else
16191622
ext::oneapi::detail::reduGetMaxNumConcurrentWorkGroups(MQueue);
16201623
#endif
1621-
// TODO: currently the maximal work group size is determined for the given
1624+
// TODO: currently the preferred work group size is determined for the given
16221625
// queue/device, while it is safer to use queries to the kernel pre-compiled
16231626
// for the device.
1624-
size_t MaxWGSize =
1625-
ext::oneapi::detail::reduGetMaxWGSize(MQueue, OneElemSize);
1627+
size_t PrefWGSize =
1628+
ext::oneapi::detail::reduGetPreferredWGSize(MQueue, OneElemSize);
16261629
if (ext::oneapi::detail::reduCGFuncForRange<KernelName>(
1627-
*this, KernelFunc, Range, MaxWGSize, NumConcurrentWorkGroups,
1630+
*this, KernelFunc, Range, PrefWGSize, NumConcurrentWorkGroups,
16281631
Redu)) {
16291632
this->finalize();
16301633
MLastEvent = withAuxHandler(QueueCopy, [&](handler &CopyHandler) {

sycl/source/detail/config.def

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,4 @@ CONFIG(INTEL_ENABLE_OFFLOAD_ANNOTATIONS, 1, __SYCL_INTEL_ENABLE_OFFLOAD_ANNOTATI
3838
CONFIG(SYCL_ENABLE_DEFAULT_CONTEXTS, 1, __SYCL_ENABLE_DEFAULT_CONTEXTS)
3939
CONFIG(SYCL_QUEUE_THREAD_POOL_SIZE, 4, __SYCL_QUEUE_THREAD_POOL_SIZE)
4040
CONFIG(SYCL_RT_WARNING_LEVEL, 4, __SYCL_RT_WARNING_LEVEL)
41+
CONFIG(SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE, 16, __SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE)

sycl/source/detail/config.hpp

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,11 @@ template <ConfigID Config> class SYCLConfigBase;
102102
#include "config.def"
103103
#undef CONFIG
104104

105+
#define INVALID_CONFIG_EXCEPTION(BASE, MSG) \
106+
sycl::exception(sycl::make_error_code(sycl::errc::invalid), \
107+
"Invalid value for " + std::string{BASE::MConfigName} + \
108+
" environment variable: " + MSG)
109+
105110
template <ConfigID Config> class SYCLConfig {
106111
using BaseT = SYCLConfigBase<Config>;
107112

@@ -467,6 +472,127 @@ template <> class SYCLConfig<SYCL_CACHE_DIR> {
467472
}
468473
};
469474

475+
template <> class SYCLConfig<SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE> {
476+
using BaseT = SYCLConfigBase<SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE>;
477+
478+
struct ParsedValue {
479+
size_t CPU = 0;
480+
size_t GPU = 0;
481+
size_t Accelerator = 0;
482+
};
483+
484+
public:
485+
static size_t get(info::device_type DeviceType) {
486+
ParsedValue Value = getCachedValue();
487+
return getRefByDeviceType(Value, DeviceType);
488+
}
489+
490+
static void reset() { (void)getCachedValue(/*ResetCache=*/true); }
491+
492+
static const char *getName() { return BaseT::MConfigName; }
493+
494+
private:
495+
static size_t &getRefByDeviceType(ParsedValue &Value,
496+
info::device_type DeviceType) {
497+
switch (DeviceType) {
498+
case info::device_type::cpu:
499+
return Value.CPU;
500+
case info::device_type::gpu:
501+
return Value.GPU;
502+
case info::device_type::accelerator:
503+
return Value.Accelerator;
504+
default:
505+
// Expect to get here if user used wrong device type. Include wildcard
506+
// in the message even though it's handled in the caller.
507+
throw INVALID_CONFIG_EXCEPTION(
508+
BaseT, "Device types must be \"cpu\", \"gpu\", \"acc\", or \"*\".");
509+
}
510+
}
511+
512+
static ParsedValue parseValue() {
513+
const char *ValueRaw = BaseT::getRawValue();
514+
ParsedValue Result{};
515+
516+
// Default to 0 to signify an unset value.
517+
if (!ValueRaw)
518+
return Result;
519+
520+
std::string ValueStr{ValueRaw};
521+
auto DeviceTypeMap = getSyclDeviceTypeMap();
522+
523+
// Iterate over all configurations.
524+
size_t Start = 0, End = 0;
525+
do {
526+
End = ValueStr.find(',', Start);
527+
if (End == std::string::npos)
528+
End = ValueStr.size();
529+
530+
// Get a substring of the current configuration pair.
531+
std::string DeviceConfigStr = ValueStr.substr(Start, End - Start);
532+
533+
// Find the delimiter in the configuration pair.
534+
size_t ConfigDelimLoc = DeviceConfigStr.find(':');
535+
if (ConfigDelimLoc == std::string::npos)
536+
throw INVALID_CONFIG_EXCEPTION(
537+
BaseT, "Device-value pair \"" + DeviceConfigStr +
538+
"\" does not contain the ':' delimiter.");
539+
540+
// Split configuration pair into its constituents.
541+
std::string DeviceConfigTypeStr =
542+
DeviceConfigStr.substr(0, ConfigDelimLoc);
543+
std::string DeviceConfigValueStr = DeviceConfigStr.substr(
544+
ConfigDelimLoc + 1, DeviceConfigStr.size() - ConfigDelimLoc - 1);
545+
546+
// Find the device type in the "device type map".
547+
auto DeviceTypeIter = std::find_if(
548+
std::begin(DeviceTypeMap), std::end(DeviceTypeMap),
549+
[&](auto Element) { return DeviceConfigTypeStr == Element.first; });
550+
if (DeviceTypeIter == DeviceTypeMap.end())
551+
throw INVALID_CONFIG_EXCEPTION(
552+
BaseT,
553+
"\"" + DeviceConfigTypeStr + "\" is not a recognized device type.");
554+
555+
// Parse the configuration value.
556+
int DeviceConfigValue = 1;
557+
try {
558+
DeviceConfigValue = std::stoi(DeviceConfigValueStr);
559+
} catch (...) {
560+
throw INVALID_CONFIG_EXCEPTION(
561+
BaseT, "Value \"" + DeviceConfigValueStr + "\" must be a number");
562+
}
563+
564+
if (DeviceConfigValue < 1)
565+
throw INVALID_CONFIG_EXCEPTION(BaseT,
566+
"Value \"" + DeviceConfigValueStr +
567+
"\" must be larger than zero");
568+
569+
if (DeviceTypeIter->second == info::device_type::all) {
570+
// Set all configuration values if we got the device-type wildcard.
571+
Result.GPU = DeviceConfigValue;
572+
Result.CPU = DeviceConfigValue;
573+
Result.Accelerator = DeviceConfigValue;
574+
} else {
575+
// Try setting the corresponding configuration.
576+
getRefByDeviceType(Result, DeviceTypeIter->second) = DeviceConfigValue;
577+
}
578+
579+
// Move to the start of the next configuration. If the start is outside
580+
// the full value string we are done.
581+
Start = End + 1;
582+
} while (Start < ValueStr.size());
583+
return Result;
584+
}
585+
586+
static ParsedValue getCachedValue(bool ResetCache = false) {
587+
static ParsedValue Val = parseValue();
588+
if (ResetCache)
589+
Val = parseValue();
590+
return Val;
591+
}
592+
};
593+
594+
#undef INVALID_CONFIG_EXCEPTION
595+
470596
} // namespace detail
471597
} // __SYCL_INLINE_VER_NAMESPACE(_V1)
472598
} // namespace sycl

sycl/source/detail/reduction.cpp

Lines changed: 44 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
//
77
//===----------------------------------------------------------------------===//
88

9+
#include <detail/config.hpp>
910
#include <detail/queue_impl.hpp>
1011
#include <sycl/ext/oneapi/reduction.hpp>
1112

@@ -67,6 +68,7 @@ reduGetMaxWGSize(std::shared_ptr<sycl::detail::queue_impl> Queue,
6768
size_t LocalMemBytesPerWorkItem) {
6869
device Dev = Queue->get_device();
6970
size_t MaxWGSize = Dev.get_info<info::device::max_work_group_size>();
71+
7072
size_t WGSizePerMem = MaxWGSize * 2;
7173
size_t WGSize = MaxWGSize;
7274
if (LocalMemBytesPerWorkItem != 0) {
@@ -93,21 +95,54 @@ reduGetMaxWGSize(std::shared_ptr<sycl::detail::queue_impl> Queue,
9395
// the local memory assigned to one work-group by code in another work-group.
9496
// It seems the only good solution for this work-group detection problem is
9597
// kernel precompilation and querying the kernel properties.
96-
if (WGSize >= 4) {
98+
if (WGSize >= 4 && WGSizePerMem < MaxWGSize * 2) {
9799
// Let's return a twice smaller number, but... do that only if the kernel
98-
// is limited by memory, or the kernel uses opencl:cpu backend, which
99-
// surprisingly uses lots of resources to run the kernels with reductions
100-
// and often causes CL_OUT_OF_RESOURCES error even when reduction
101-
// does not use local accessors.
102-
if (WGSizePerMem < MaxWGSize * 2 ||
103-
(Queue->get_device().is_cpu() &&
104-
Queue->get_device().get_platform().get_backend() == backend::opencl))
105-
WGSize /= 2;
100+
// is limited by memory.
101+
WGSize /= 2;
106102
}
107103

108104
return WGSize;
109105
}
110106

107+
__SYCL_EXPORT size_t reduGetPreferredWGSize(std::shared_ptr<queue_impl> &Queue,
108+
size_t LocalMemBytesPerWorkItem) {
109+
device Dev = Queue->get_device();
110+
111+
// The maximum WGSize returned by CPU devices is very large and does not
112+
// help the reduction implementation: since all work associated with a
113+
// work-group is typically assigned to one CPU thread, selecting a large
114+
// work-group size unnecessarily increases the number of accumulators.
115+
// The default of 16 was chosen based on empirical benchmarking results;
116+
// an environment variable is provided to allow users to override this
117+
// behavior.
118+
using PrefWGConfig = sycl::detail::SYCLConfig<
119+
sycl::detail::SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE>;
120+
if (Dev.is_cpu()) {
121+
size_t CPUMaxWGSize = PrefWGConfig::get(info::device_type::cpu);
122+
if (CPUMaxWGSize == 0)
123+
return 16;
124+
size_t DevMaxWGSize = Dev.get_info<info::device::max_work_group_size>();
125+
return std::min(CPUMaxWGSize, DevMaxWGSize);
126+
}
127+
128+
// If the user has specified an explicit preferred work-group size we use
129+
// that.
130+
if (Dev.is_gpu() && PrefWGConfig::get(info::device_type::gpu)) {
131+
size_t DevMaxWGSize = Dev.get_info<info::device::max_work_group_size>();
132+
return std::min(PrefWGConfig::get(info::device_type::gpu), DevMaxWGSize);
133+
}
134+
135+
if (Dev.is_accelerator() &&
136+
PrefWGConfig::get(info::device_type::accelerator)) {
137+
size_t DevMaxWGSize = Dev.get_info<info::device::max_work_group_size>();
138+
return std::min(PrefWGConfig::get(info::device_type::accelerator),
139+
DevMaxWGSize);
140+
}
141+
142+
// Use the maximum work-group size otherwise.
143+
return reduGetMaxWGSize(Queue, LocalMemBytesPerWorkItem);
144+
}
145+
111146
} // namespace detail
112147
} // namespace oneapi
113148
} // namespace ext

sycl/test/abi/sycl_symbols_linux.dump

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3666,6 +3666,7 @@ _ZN4sycl3_V13ext6oneapi15filter_selectorC1ERKNSt7__cxx1112basic_stringIcSt11char
36663666
_ZN4sycl3_V13ext6oneapi15filter_selectorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
36673667
_ZN4sycl3_V13ext6oneapi6detail16reduGetMaxWGSizeESt10shared_ptrINS0_6detail10queue_implEEm
36683668
_ZN4sycl3_V13ext6oneapi6detail17reduComputeWGSizeEmmRm
3669+
_ZN4sycl3_V13ext6oneapi6detail22reduGetPreferredWGSizeERSt10shared_ptrINS0_6detail10queue_implEEm
36693670
_ZN4sycl3_V13ext6oneapi6detail33reduGetMaxNumConcurrentWorkGroupsESt10shared_ptrINS0_6detail10queue_implEE
36703671
_ZN4sycl3_V14freeEPvRKNS0_5queueERKNS0_6detail13code_locationE
36713672
_ZN4sycl3_V14freeEPvRKNS0_7contextERKNS0_6detail13code_locationE

sycl/unittests/config/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@ set(CMAKE_CXX_EXTENSIONS OFF)
22

33
add_sycl_unittest(ConfigTests OBJECT
44
ConfigTests.cpp
5+
PreferredWGSizeConfigTests.cpp
56
)

0 commit comments

Comments
 (0)