[SYCLomatic] Fix auto pulldown conflict from SYCL to SYCLomatic #2656

tomflinda · 2025-02-10T08:59:28Z

The pulldown includes four PRs below to fix IP Leak issues
[AVX10.2] Update convert chapter intrinsic and mnemonics names (#123656)
[X86][AVX10.2-SATCVT][NFC] Remove NE from intrinsic and instruction name (#123275)
[X86][AVX10.2-MINMAX][NFC] Remove NE[P] from intrinsic and instruction (#123272)
[X86][AMX-AVX512][NFC] Remove P from intrinsic and instruction name (#123270)

Signed-off-by: chenwei.sun chenwei.sun@intel.com

Summary: This PR fixes bugreport llvm/llvm-project#122493 The root problem is the same as before lambda function and DeclRefExpr references a variable that does not belong to the same module as the enclosing function body. Therefore iteration over the function body doesn’t visit the VarDecl. Before this change RelatedDeclsMap was created only for canonical decl but in reality it has to be done for the definition of the function that does not always match the canonical decl. Test Plan: check-clang

…730) oneapi-src/unified-runtime#2193 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

…… (#16737) …ions. These functions were wrongly defined to take addrspace(1). They take generic pointers which are not annotated with any explicit address space.

Intrinsics are available for the 'cpSize' variants also. So, this patch migrates the Op to lower to the intrinsics for all cases. * Update the existing tests to check the lowering to intrinsics. * Add newer cp_async_zfill tests to verify the lowering for the 'cpSize' variants. * Tidy-up CHECK lines in cp_async() function in nvvmir.mlir (NFC) PTX spec link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

We use variable locations such as DBG_VALUE $xmm0 as shorthand to refer to "the low lane of $xmm0", and this is reflected in how DWARF is interpreted too. However InstrRefBasedLDV tries to be smart and interprets such a DBG_VALUE as a 128-bit reference. We then issue a DW_OP_deref_size of 128 bits to the stack, which isn't permitted by DWARF (it's larger than a pointer). Solve this for now by not using DW_OP_deref_size if it would be illegal. Instead we'll use DW_OP_deref, and the consumer will load the variable type from the stack, which should be correct. There's still a risk of imprecision when LLVM decides to use smaller or larger value types than the source-variable type, which manifests as too-little or too-much memory being read from the stack. However we can't solve that without putting more type information in debug-info. fixes #64093

… object parameters (#124096) LLDB deduces the CV-qualifiers and storage class of a C++ method from the object parameter. Currently it assumes that parameter is implicit (and is a pointer type with the name "this"). This isn't true anymore in C++23 with explicit object parameters. To support those we can simply check the `DW_AT_object_pointer` of the subprogram DIE (works for both declarations and definitions) when searching for the object parameter. We can also remove the check for `eEncodingIsPointerUID`, because in C++ an artificial parameter called `this` is only ever the implicit object parameter (at least for all the major compilers).

This patch adds NVVM intrinsics and NVPTX codegen for: - cp.async.bulk.prefetch.L2.* variants - These intrinsics optionally support cache_hints as indicated by the boolean flag argument. - Lit tests are added for all combinations of these intrinsics in cp-async-bulk.ll. - The generated PTX is verified with a 12.3 ptxas executable. - Added docs for these intrinsics in NVPTXUsage.rst file. PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch Co-authored-by: abmajumder <abmajumder@nvidia.com>

…explicit object parameters" (#124100) Reverts llvm/llvm-project#124096 Broke linux CI: ``` Note: This is test shard 7 of 42. [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from DWARFASTParserClangTests [ RUN ] DWARFASTParserClangTests.TestParseSubroutine_ExplicitObjectParameter Expected<T> must be checked before access or destruction. Expected<T> value was in success state. (Note: Expected<T> values in success mode must still be checked prior to being destroyed). Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 SymbolFileDWARFTests 0x0000560271ee5ba7 1 SymbolFileDWARFTests 0x0000560271ee3a2c 2 SymbolFileDWARFTests 0x0000560271ee63ea 3 libc.so.6 0x00007f3e54e5b050 4 libc.so.6 0x00007f3e54ea9e2c 5 libc.so.6 0x00007f3e54e5afb2 gsignal + 18 6 libc.so.6 0x00007f3e54e45472 abort + 211 7 SymbolFileDWARFTests 0x0000560271e79d51 8 SymbolFileDWARFTests 0x0000560271e724f7 9 SymbolFileDWARFTests 0x0000560271f39e2c 10 SymbolFileDWARFTests 0x0000560271f3b368 11 SymbolFileDWARFTests 0x0000560271f3c053 12 SymbolFileDWARFTests 0x0000560271f4cf67 13 SymbolFileDWARFTests 0x0000560271f4c18a 14 SymbolFileDWARFTests 0x0000560271f2561c 15 libc.so.6 0x00007f3e54e4624a 16 libc.so.6 0x00007f3e54e46305 __libc_start_main + 133 17 SymbolFileDWARFTests 0x0000560271e65161 ```

Nothing in VPlan.h directly uses VPBlockUtils.h. Move it out to the more appropriate VPlanUtils.h to reduce the size of the widely included VPlan.h.

…24101) Reverts llvm/llvm-project#123393 This is causing `TestVectorOfVectorsFromStdModule.py` to fail on the the macOS clang-15 matrix bot.

For #123280

Prevents avoidable memory leaks. Looks like exchange added in aa1333a didn't take "continue" into account. ``` ==llc==2150782==ERROR: LeakSanitizer: detected memory leaks Direct leak of 10 byte(s) in 1 object(s) allocated from: #0 0x5f1b0f9ac14a in strdup llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:593:3 oneapi-src#1 0x5f1b1768428d in FileToRemoveList llvm-project/llvm/lib/Support/Unix/Signals.inc:105:55 ```

…explicit object parameters" (#124100)" This reverts commit a802093. Relands original commit but fixing the unit-test to consume the `llvm::Expected` error object.

Using a "random" name for an "anonymous" pipe seems to be the state of the art on windows (according to stack overflow, new windows versions may have something better, but it involves calling kernel APIs directly and generally a lot of dark magic). The problem with the current method was that is does not produce unique names if one has two copies of the pipe code in the same process, which is what happened with #120457 (because liblldb only exposes the public api, and we've started using the pipe code in lldb-dap as well). This patch works around the problem by adding the address of the counter variable to the pipe name. Replicating the multiple-copies setup in a test would be very difficult, which is why I'm not adding a test for this scenario.

.. by changing the signal stop reason format 🤦 The reason this did not work is because the code in `StopInfo::GetCrashingDereference` was looking for the string "address=" to extract the address of the crash. Macos stop reason strings have the form ``` EXC_BAD_ACCESS (code=1, address=0xdead) ``` while on linux they look like: ``` signal SIGSEGV: address not mapped to object (fault address: 0xdead) ``` Extracting the address from a string sounds like a bad idea, but I suppose there's some value in using a consistent format across platforms, so this patch changes the signal format to use the equals sign as well. All of the diagnose tests pass except one, which appears to fail due to something similar #115453 (disassembler reports unrelocated call targets). I've left the tests disabled on windows, as the stop reason reporting code works very differently there, and I suspect it won't work out of the box. If I'm wrong -- the XFAIL will let us know.

UR PR: oneapi-src/unified-runtime#2605 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

…pecializations (#118167) Some templates in the standard library are illegal to specialize for users (even if the specialization contains user-defined types). The [[clang::no_specializations]] attribute allows marking such base templates so that the compiler will diagnose if users try adding a specialization.

There were two implementations of this - one that implemented nextafter in software, and another that called a clang builtin. No in-tree targets called the builtin, so all targets build the software version. The builtin version has been removed, and the software version has been renamed to be the "default". This commit also optimizes nextafter, to avoid scalarization as much as possible. Note however that the (CLC) relational builtins still scalarize; those will be optimized in a separate commit. Since nextafter is used by some convert_type builtins, the diff to IR codegen is not limited to the builtin itself.

The code that checks a predicate against a swapped predicate in isImpliedCondBalancedTypes is not covered by any existing test, within any Analysis or Transform. Fix this by adding a test to SCEV.

Support reductions in SCFToGPU: `scf.parallel` and `scf.reduce` op combination is now converted to a `gpu.all_reduce` op.

Intel docs have been updated to be similar to AMD and now describe BSF/BSR as not changing the destination register if the input value was zero, which allows us to support CTTZ/CTLZ zero-input cases by setting the destination to support a NumBits result (BSR is a bit messy as it has to be XOR'd to create a CTLZ result). VIA/Zhaoxin x86_64 CPUs have also been confirmed to match this behaviour. This patch adjusts the X86ISD::BSF/BSR nodes to take a "pass through" argument for zero-input cases, by default this is set to UNDEF to match existing behaviour, but it can be set to a suitable value if supported. There are still some limits to this - its only supported for x86_64 capable processors (and I've only enabled it for x86_64 codegen), and Intel CPUs sometimes zero the upper 32-bits of a pass through register when used for BSR32/BSF32 with a zero source value (i.e. the whole 64bits may not get passed through). Fixes #122004

Split target under LLVMIR/Transforms to avoid deps loop.

This code was using a pre-move-semantics trick of using std::swap to avoid expensive vector copies.

This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion. See the draft PR #123221 for an attempt to fix the actual issue. --------- Co-authored-by: Frederik Harwath <fharwath@amd.com>

This header has been removed in C++20 and causes a large amount of deprecation spam when building against libstdc++ 15 in C++17 mode. As far as I understand, we just need to include *some* STL header to get access to the version macros, and as this header also includes <optional> nowadays we can just drop the <cstd646> include entirely.

…124089) Fixes #123800 Extends LDS lowering by allowing it to discover transitive indirect/escpaing references to LDS GVs. For example, given the following input: ```llvm @lds_item_to_indirectly_load = internal addrspace(3) global ptr undef, align 8 %store_type = type { i32, ptr } @place_to_store_indirect_caller = internal addrspace(3) global %store_type undef, align 8 define amdgpu_kernel void @offloading_kernel() { store ptr @indirectly_load_lds, ptr addrspace(3) getelementptr inbounds nuw (i8, ptr addrspace(3) @place_to_store_indirect_caller, i32 0), align 8 call void @call_unknown() ret void } define void @call_unknown() { %1 = alloca ptr, align 8 %2 = call i32 %1() ret void } define void @indirectly_load_lds() { call void @directly_load_lds() ret void } define void @directly_load_lds() { %2 = load ptr, ptr addrspace(3) @lds_item_to_indirectly_load, align 8 ret void } ``` With the above input, prior to this patch, LDS lowering failed to lower the reference to `@lds_item_to_indirectly_load` because: 1. it is indirectly called by a function whose address is taken in the kernel. 2. we did not check if the kernel indirectly makes any calls to unknown functions (we only checked the direct calls). Co-authored-by: Jon Chesterfield <jonathan.chesterfield@amd.com>

…tizer tests (#16886)

Matching PR: oneapi-src/unified-runtime#2655 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

…s (#16792) SYCL_ENABLE_PCI is deprecated here intel/llvm@2d12863. Remove it as requirement for device info query. Remove statement that some device info is supported on L0 only if there is at least one another backend that could return it. (mostly it is L0 +hip/cuda). --------- Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

Implement [sycl_ext_oneapi_device_image_backend_content](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_image_backend_content.asciidoc).

Add autodetection of the CUDA SDK using CMake's builtin `find_package`. I didn't remove the LIT params or handling in lit.py as to not break anyone who may want to use a custom install. Confirmed working [here](https://github.com/intel/llvm/actions/runs/13166519697/job/36748743079?pr=16896) as `Adapters/cuda_queue_priority.cpp` passed. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

…904) Main reason is to avoid having partial specializations for swizzles in the implementation of the traits as we're going to have big changes in that area. Between this change and having to update the traits this approach seemed better.

Also, make kernel_bundle_impl member variable naming consistent.

Follow-up from intel/llvm#16896 Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

…erties (#16728) As the title says, added deprecation messages for such overloads in the `sycl_ext_oneapi_kernel_properties extension`, suggesting users to use `single_task/parallel_for` overloads provided in the `sycl_ext_oneapi_enqueue_functions` extension instead. (As these overloads are to be removed later as mentioned in intel/llvm#14785) Also fixed an affected test case. --------- Signed-off-by: Hu, Peisen <peisen.hu@intel.com>

Add `XFAIL: accelerator` to `OCL_interop_test.cpp` because of [this](https://github.com/intel/llvm/actions/runs/13187415776/job/36814982904) and remove a rogue print statement meant for debugging purposes during development which is most likely unrelated to the failure. An issue for the failure has also been created intel/llvm#16914

Avoid using partial specialization for swizzles as it depends on the swizzles implementation and we'll be making significant changes there to align with the proposed specification changes.

Simplify handling of multiple address spaces and alignment checks. Additional improvement regarding alignment checks is being done here (to perform compile-time alignment check instead of expensive dynamic check): intel/llvm#16882 Also this PR fixes alignment requirement for local address space: 16-byte alignment is required for both load and store.

…913) That would match with pre-existing `check_type_in_v`.

fixes intel/llvm#16693 --------- Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

…some Matrix tests (#16855) Adds `UNSUPPORTED: target-nvidia, target-amd` to matrix tests that also require `aspect-ext_intel_matrix` since these tests are not yet supported for those triples.

1. not treating not having two GPU devices as an error, but skip the test instead. 2. If the two GPU devices are not the same model, skip the test(not our targeted setup).

Signed-off-by: chenwei.sun <chenwei.sun@intel.com>

tomflinda · 2025-02-27T08:11:23Z

This pulldown PR is out of date, so close it.

dmpolukhin and others added 30 commits January 23, 2025 10:35

[UR] Make adapters check native properties before dereferencing. (#16…

73f54e5

…730) oneapi-src/unified-runtime#2193 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

[NativeCPU] Fix __clc__sync_fetch_and_{,u}{min,max}_generic_8 definit…

2ea15fb

…… (#16737) …ions. These functions were wrongly defined to take addrspace(1). They take generic pointers which are not annotated with any explicit address space.

[Clang] [Release Notes] Implicit lifetimes are a C++23 feature

17756aa

[VPlan] Move VPBlockUtils to VPlanUtils.h (NFC)

05fbc38

Nothing in VPlan.h directly uses VPBlockUtils.h. Move it out to the more appropriate VPlanUtils.h to reduce the size of the widely included VPlan.h.

Revert "[lldb][test] Remove compiler version check and use regex" (#1…

4bcdb26

…24101) Reverts llvm/llvm-project#123393 This is causing `TestVectorOfVectorsFromStdModule.py` to fail on the the macOS clang-15 matrix bot.

[NFC][YAML] Add IO::error() (#123475)

4f26edd

For #123280

Reland "[lldb][DWARFASTParserClang] Make C++ method parsing aware of …

636bc72

…explicit object parameters" (#124100)" This reverts commit a802093. Relands original commit but fixing the unit-test to consume the `llvm::Expected` error object.

[UR][DeviceASAN] Bugfix for GetDeviceType (#16745)

cc6148d

UR PR: oneapi-src/unified-runtime#2605 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

SCEV: cover a codepath in isImpliedCondBalancedTypes (#123070)

e069518

The code that checks a predicate against a swapped predicate in isImpliedCondBalancedTypes is not covered by any existing test, within any Analysis or Transform. Fix this by adding a test to SCEV.

[SCFToGPU] Convert scf.parallel+scf.reduce to gpu.all_reduce (#122782)

0e944a3

Support reductions in SCFToGPU: `scf.parallel` and `scf.reduce` op combination is now converted to a `gpu.all_reduce` op.

[bazel]Fix bazel build past 2e6cc79 (#124112)

0c66644

Split target under LLVMIR/Transforms to avoid deps loop.

[M68k] Fix llc pass test after 3630d9e

590e5e2

[FileCheck] Use move semantics instead of std::swap. NFC. (#123304)

d3d605b

This code was using a pre-move-semantics trick of using std::swap to avoid expensive vector copies.

MachineRegisterInfo: Use variable for TRI

fb3fa41

[gn] fix mistake in ee99c4d

92b839e

ayylol and others added 22 commits February 6, 2025 09:54

[SYCL][E2E] Mark non spir build targets as unsupported in AddressSani…

d264f5a

…tizer tests (#16886)

[SYCL][NativeCPU] Handle null phEvent. (#16868)

1fe1862

Matching PR: oneapi-src/unified-runtime#2655 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

[SYCL] Implement backend content extension (#16633)

7e2f297

Implement [sycl_ext_oneapi_device_image_backend_content](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_image_backend_content.asciidoc).

[NFC][SYCL] Inline is_valid_elem_type_v (#16908)

a49e92e

[SYCL][NFC] Fix Coverity hits (#16752)

37a9023

Also, make kernel_bundle_impl member variable naming consistent.

[SYCL][E2E] Don't override CUDA CMake vars if set (#16911)

498af15

Follow-up from intel/llvm#16896 Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[CI] Add Coverity workflow (#16894)

15111b4

[SYCL] Fix dangling reference (#16925)

c2dc01d

[NFC][SYCL] Refactor EnableIfSwizzleVec in sycl/stream.hpp (#16906)

b48f41a

[NFC][SYCL] Drop unused traits (#16912)

14d5771

[NFCI][SYCL] Refactor sycl::detail::make_unsigned (#16905)

af6aa41

Avoid using partial specialization for swizzles as it depends on the swizzles implementation and we'll be making significant changes there to align with the proposed specification changes.

[SYCL][E2E] Print features detected by lit (#16866)

7ed4491

[NFC][SYCL] Substitute is_valid_size_v with check_value_in_v (#16…

15e3e9d

…913) That would match with pre-existing `check_type_in_v`.

[SYCL]E2E] Fix fpga_pipes.cpp verification bug (#16796)

cf6f2c4

fixes intel/llvm#16693 --------- Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

[SYCL][Matrix][E2E] Mark AMD/NVIDIA build targets as unsupported for …

1e3940b

…some Matrix tests (#16855) Adds `UNSUPPORTED: target-nvidia, target-amd` to matrix tests that also require `aspect-ext_intel_matrix` since these tests are not yet supported for those triples.

[DeviceASAN][test] Update shadow-virtual-mem.cpp (#16917)

f791cd1

1. not treating not having two GPU devices as an error, but skip the test instead. 2. If the two GPU devices are not the same model, skip the test(not our targeted setup).

tomflinda requested a review from a team as a code owner February 10, 2025 08:59

tomflinda requested review from zhimingwang36 and ziranzha February 10, 2025 08:59

tomflinda added 3 commits February 11, 2025 08:02

Merge remote-tracking branch 'origin/sycl' into SYCLomatic_pulldown_0210

c36c5ca

Signed-off-by: chenwei.sun <chenwei.sun@intel.com>

Manual fix the conflicting files

68d9868

Signed-off-by: chenwei.sun <chenwei.sun@intel.com>

Fix lit test run failed in CI test

bf11c0a

Signed-off-by: chenwei.sun <chenwei.sun@intel.com>

tomflinda closed this Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCLomatic] Fix auto pulldown conflict from SYCL to SYCLomatic #2656

[SYCLomatic] Fix auto pulldown conflict from SYCL to SYCLomatic #2656

Uh oh!

tomflinda commented Feb 10, 2025 •

edited

Loading

Uh oh!

tomflinda commented Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

96 participants

[SYCLomatic] Fix auto pulldown conflict from SYCL to SYCLomatic #2656

[SYCLomatic] Fix auto pulldown conflict from SYCL to SYCLomatic #2656

Uh oh!

Conversation

tomflinda commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomflinda commented Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

96 participants

tomflinda commented Feb 10, 2025 •

edited

Loading