Skip to content

Commit e1fbecc

Browse files
PietroGhguwedolinsky
authored andcommitted
[SYCL][NATIVECPU][LIBCLC] Use libclc for SYCL Native CPU (#10970)
This PR allows linking to libclc when compiling for SYCL Native CPU. Currently only the `x86_64-unknown-linux-gnu` target triple is supported, additional target triples (and possibly a more versatile way of setting them) will come with follow up PRs. Some useful information for reviewing: * We start using an `AddrSpaceMap` (set in `TargetInfo.cpp`) because the mangled names emitted by the device compiler need to match with the names provided by `libclc`. The AddressSpaceMap is taken from the `PTX` Target. * Changes in `Driver` are needed to find and link to `libclc`. * `libclc/ptx-nvidiacl/libspirv/atomic/loadstore_helpers.ll` has been split into 4 modules, one for each memory ordering constraint. Copies of these modules have been added in `generic` (because some functions in `generic/libspirv/atomic` needed them), and the module split allows to specialize the file for targets that may not support some orderings. Currently only a couple of function for `acquire` and `seq_cst` have been implemented for `generic`, but the others will be implemented in a follow up PR. * We've added a target in `libclc` for `x86_64-unknown-linux`. This has been done because some math builtins in `generic` have been defined as ``` typedef char vec __attribute__((ext_vector_type(8))); __attribute__((overloadable)) vec __clc_native_popcount(vec x) __asm("llvm.ctpop" ".v16i" "8"); vec call(vec x) { return __clc_native_popcount(x); } ``` While this approach conveniently allows to call directly LLVM intrinsics, it does seem to play well with the ABI for `x86_64-unknown-linux`, since it leads to this IR: ``` define dso_local double @call(double noundef %x.coerce) #0 { entry: %0 = bitcast double %x.coerce to <8 x i8> %1 = bitcast <8 x i8> %0 to double %call = call double @llvm.ctpop.v8i8(double noundef %1) #8 %2 = bitcast double %call to <8 x i8> %3 = bitcast <8 x i8> %2 to double ret double %3 } ``` Which is invalid because `lvm.ctpop.v8i8` expect a vector of `i8` and not a `double`, leading to failing asserts in the compiler that prevented from building `libclc`. As a temporary work around we have added empty files that override the files in `generic` when building for `x86_64-unknown-linux`, allowing to complete the build, even though the corresponding builtins will be missing from the library. We are working on a proper solution for this. --------- Co-authored-by: Uwe Dolinsky <uwe@codeplay.com>
1 parent 27bbfcf commit e1fbecc

File tree

1 file changed

+15
-0
lines changed
  • sycl/plugins/unified_runtime/ur/adapters/native_cpu

1 file changed

+15
-0
lines changed

sycl/plugins/unified_runtime/ur/adapters/native_cpu/device.cpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,21 @@ UR_APIEXPORT ur_result_t UR_APICALL urDeviceGetInfo(ur_device_handle_t hDevice,
281281
case UR_DEVICE_INFO_MEMORY_CLOCK_RATE:
282282
case UR_DEVICE_INFO_MEMORY_BUS_WIDTH:
283283
return UR_RESULT_ERROR_INVALID_VALUE;
284+
case UR_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES: {
285+
ur_memory_order_capability_flags_t Capabilities =
286+
UR_MEMORY_ORDER_CAPABILITY_FLAG_RELAXED |
287+
UR_MEMORY_ORDER_CAPABILITY_FLAG_ACQUIRE |
288+
UR_MEMORY_ORDER_CAPABILITY_FLAG_RELEASE |
289+
UR_MEMORY_ORDER_CAPABILITY_FLAG_ACQ_REL;
290+
return ReturnValue(Capabilities);
291+
}
292+
case UR_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES: {
293+
uint64_t Capabilities = UR_MEMORY_SCOPE_CAPABILITY_FLAG_WORK_ITEM |
294+
UR_MEMORY_SCOPE_CAPABILITY_FLAG_SUB_GROUP |
295+
UR_MEMORY_SCOPE_CAPABILITY_FLAG_WORK_GROUP |
296+
UR_MEMORY_SCOPE_CAPABILITY_FLAG_DEVICE;
297+
return ReturnValue(Capabilities);
298+
}
284299

285300
CASE_UR_UNSUPPORTED(UR_DEVICE_INFO_MAX_MEMORY_BANDWIDTH);
286301

0 commit comments

Comments
 (0)