-
Notifications
You must be signed in to change notification settings - Fork 97
[SYCLomatic] Fix auto pulldown conflict from SYCL to SYCLomatic #2656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
tomflinda
wants to merge
10,000
commits into
oneapi-src:SYCLomatic
from
tomflinda:SYCLomatic_pulldown_0210
Closed
[SYCLomatic] Fix auto pulldown conflict from SYCL to SYCLomatic #2656
tomflinda
wants to merge
10,000
commits into
oneapi-src:SYCLomatic
from
tomflinda:SYCLomatic_pulldown_0210
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: This PR fixes bugreport llvm/llvm-project#122493 The root problem is the same as before lambda function and DeclRefExpr references a variable that does not belong to the same module as the enclosing function body. Therefore iteration over the function body doesn’t visit the VarDecl. Before this change RelatedDeclsMap was created only for canonical decl but in reality it has to be done for the definition of the function that does not always match the canonical decl. Test Plan: check-clang
…730) oneapi-src/unified-runtime#2193 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
…… (#16737) …ions. These functions were wrongly defined to take addrspace(1). They take generic pointers which are not annotated with any explicit address space.
Intrinsics are available for the 'cpSize' variants also. So, this patch migrates the Op to lower to the intrinsics for all cases. * Update the existing tests to check the lowering to intrinsics. * Add newer cp_async_zfill tests to verify the lowering for the 'cpSize' variants. * Tidy-up CHECK lines in cp_async() function in nvvmir.mlir (NFC) PTX spec link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
We use variable locations such as DBG_VALUE $xmm0 as shorthand to refer to "the low lane of $xmm0", and this is reflected in how DWARF is interpreted too. However InstrRefBasedLDV tries to be smart and interprets such a DBG_VALUE as a 128-bit reference. We then issue a DW_OP_deref_size of 128 bits to the stack, which isn't permitted by DWARF (it's larger than a pointer). Solve this for now by not using DW_OP_deref_size if it would be illegal. Instead we'll use DW_OP_deref, and the consumer will load the variable type from the stack, which should be correct. There's still a risk of imprecision when LLVM decides to use smaller or larger value types than the source-variable type, which manifests as too-little or too-much memory being read from the stack. However we can't solve that without putting more type information in debug-info. fixes #64093
… object parameters (#124096) LLDB deduces the CV-qualifiers and storage class of a C++ method from the object parameter. Currently it assumes that parameter is implicit (and is a pointer type with the name "this"). This isn't true anymore in C++23 with explicit object parameters. To support those we can simply check the `DW_AT_object_pointer` of the subprogram DIE (works for both declarations and definitions) when searching for the object parameter. We can also remove the check for `eEncodingIsPointerUID`, because in C++ an artificial parameter called `this` is only ever the implicit object parameter (at least for all the major compilers).
This patch adds NVVM intrinsics and NVPTX codegen for: - cp.async.bulk.prefetch.L2.* variants - These intrinsics optionally support cache_hints as indicated by the boolean flag argument. - Lit tests are added for all combinations of these intrinsics in cp-async-bulk.ll. - The generated PTX is verified with a 12.3 ptxas executable. - Added docs for these intrinsics in NVPTXUsage.rst file. PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch Co-authored-by: abmajumder <abmajumder@nvidia.com>
…explicit object parameters" (#124100) Reverts llvm/llvm-project#124096 Broke linux CI: ``` Note: This is test shard 7 of 42. [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from DWARFASTParserClangTests [ RUN ] DWARFASTParserClangTests.TestParseSubroutine_ExplicitObjectParameter Expected<T> must be checked before access or destruction. Expected<T> value was in success state. (Note: Expected<T> values in success mode must still be checked prior to being destroyed). Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 SymbolFileDWARFTests 0x0000560271ee5ba7 1 SymbolFileDWARFTests 0x0000560271ee3a2c 2 SymbolFileDWARFTests 0x0000560271ee63ea 3 libc.so.6 0x00007f3e54e5b050 4 libc.so.6 0x00007f3e54ea9e2c 5 libc.so.6 0x00007f3e54e5afb2 gsignal + 18 6 libc.so.6 0x00007f3e54e45472 abort + 211 7 SymbolFileDWARFTests 0x0000560271e79d51 8 SymbolFileDWARFTests 0x0000560271e724f7 9 SymbolFileDWARFTests 0x0000560271f39e2c 10 SymbolFileDWARFTests 0x0000560271f3b368 11 SymbolFileDWARFTests 0x0000560271f3c053 12 SymbolFileDWARFTests 0x0000560271f4cf67 13 SymbolFileDWARFTests 0x0000560271f4c18a 14 SymbolFileDWARFTests 0x0000560271f2561c 15 libc.so.6 0x00007f3e54e4624a 16 libc.so.6 0x00007f3e54e46305 __libc_start_main + 133 17 SymbolFileDWARFTests 0x0000560271e65161 ```
Nothing in VPlan.h directly uses VPBlockUtils.h. Move it out to the more appropriate VPlanUtils.h to reduce the size of the widely included VPlan.h.
…24101) Reverts llvm/llvm-project#123393 This is causing `TestVectorOfVectorsFromStdModule.py` to fail on the the macOS clang-15 matrix bot.
For #123280
Prevents avoidable memory leaks. Looks like exchange added in aa1333a didn't take "continue" into account. ``` ==llc==2150782==ERROR: LeakSanitizer: detected memory leaks Direct leak of 10 byte(s) in 1 object(s) allocated from: #0 0x5f1b0f9ac14a in strdup llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:593:3 oneapi-src#1 0x5f1b1768428d in FileToRemoveList llvm-project/llvm/lib/Support/Unix/Signals.inc:105:55 ```
…explicit object parameters" (#124100)" This reverts commit a802093. Relands original commit but fixing the unit-test to consume the `llvm::Expected` error object.
Using a "random" name for an "anonymous" pipe seems to be the state of the art on windows (according to stack overflow, new windows versions may have something better, but it involves calling kernel APIs directly and generally a lot of dark magic). The problem with the current method was that is does not produce unique names if one has two copies of the pipe code in the same process, which is what happened with #120457 (because liblldb only exposes the public api, and we've started using the pipe code in lldb-dap as well). This patch works around the problem by adding the address of the counter variable to the pipe name. Replicating the multiple-copies setup in a test would be very difficult, which is why I'm not adding a test for this scenario.
.. by changing the signal stop reason format 🤦 The reason this did not work is because the code in `StopInfo::GetCrashingDereference` was looking for the string "address=" to extract the address of the crash. Macos stop reason strings have the form ``` EXC_BAD_ACCESS (code=1, address=0xdead) ``` while on linux they look like: ``` signal SIGSEGV: address not mapped to object (fault address: 0xdead) ``` Extracting the address from a string sounds like a bad idea, but I suppose there's some value in using a consistent format across platforms, so this patch changes the signal format to use the equals sign as well. All of the diagnose tests pass except one, which appears to fail due to something similar #115453 (disassembler reports unrelocated call targets). I've left the tests disabled on windows, as the stop reason reporting code works very differently there, and I suspect it won't work out of the box. If I'm wrong -- the XFAIL will let us know.
UR PR: oneapi-src/unified-runtime#2605 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
…pecializations (#118167) Some templates in the standard library are illegal to specialize for users (even if the specialization contains user-defined types). The [[clang::no_specializations]] attribute allows marking such base templates so that the compiler will diagnose if users try adding a specialization.
There were two implementations of this - one that implemented nextafter in software, and another that called a clang builtin. No in-tree targets called the builtin, so all targets build the software version. The builtin version has been removed, and the software version has been renamed to be the "default". This commit also optimizes nextafter, to avoid scalarization as much as possible. Note however that the (CLC) relational builtins still scalarize; those will be optimized in a separate commit. Since nextafter is used by some convert_type builtins, the diff to IR codegen is not limited to the builtin itself.
The code that checks a predicate against a swapped predicate in isImpliedCondBalancedTypes is not covered by any existing test, within any Analysis or Transform. Fix this by adding a test to SCEV.
Support reductions in SCFToGPU: `scf.parallel` and `scf.reduce` op combination is now converted to a `gpu.all_reduce` op.
Intel docs have been updated to be similar to AMD and now describe BSF/BSR as not changing the destination register if the input value was zero, which allows us to support CTTZ/CTLZ zero-input cases by setting the destination to support a NumBits result (BSR is a bit messy as it has to be XOR'd to create a CTLZ result). VIA/Zhaoxin x86_64 CPUs have also been confirmed to match this behaviour. This patch adjusts the X86ISD::BSF/BSR nodes to take a "pass through" argument for zero-input cases, by default this is set to UNDEF to match existing behaviour, but it can be set to a suitable value if supported. There are still some limits to this - its only supported for x86_64 capable processors (and I've only enabled it for x86_64 codegen), and Intel CPUs sometimes zero the upper 32-bits of a pass through register when used for BSR32/BSF32 with a zero source value (i.e. the whole 64bits may not get passed through). Fixes #122004
Split target under LLVMIR/Transforms to avoid deps loop.
This code was using a pre-move-semantics trick of using std::swap to avoid expensive vector copies.
This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion. See the draft PR #123221 for an attempt to fix the actual issue. --------- Co-authored-by: Frederik Harwath <fharwath@amd.com>
This header has been removed in C++20 and causes a large amount of deprecation spam when building against libstdc++ 15 in C++17 mode. As far as I understand, we just need to include *some* STL header to get access to the version macros, and as this header also includes <optional> nowadays we can just drop the <cstd646> include entirely.
…124089) Fixes #123800 Extends LDS lowering by allowing it to discover transitive indirect/escpaing references to LDS GVs. For example, given the following input: ```llvm @lds_item_to_indirectly_load = internal addrspace(3) global ptr undef, align 8 %store_type = type { i32, ptr } @place_to_store_indirect_caller = internal addrspace(3) global %store_type undef, align 8 define amdgpu_kernel void @offloading_kernel() { store ptr @indirectly_load_lds, ptr addrspace(3) getelementptr inbounds nuw (i8, ptr addrspace(3) @place_to_store_indirect_caller, i32 0), align 8 call void @call_unknown() ret void } define void @call_unknown() { %1 = alloca ptr, align 8 %2 = call i32 %1() ret void } define void @indirectly_load_lds() { call void @directly_load_lds() ret void } define void @directly_load_lds() { %2 = load ptr, ptr addrspace(3) @lds_item_to_indirectly_load, align 8 ret void } ``` With the above input, prior to this patch, LDS lowering failed to lower the reference to `@lds_item_to_indirectly_load` because: 1. it is indirectly called by a function whose address is taken in the kernel. 2. we did not check if the kernel indirectly makes any calls to unknown functions (we only checked the direct calls). Co-authored-by: Jon Chesterfield <jonathan.chesterfield@amd.com>
…tizer tests (#16886)
Matching PR: oneapi-src/unified-runtime#2655 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
…s (#16792) SYCL_ENABLE_PCI is deprecated here intel/llvm@2d12863. Remove it as requirement for device info query. Remove statement that some device info is supported on L0 only if there is at least one another backend that could return it. (mostly it is L0 +hip/cuda). --------- Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
Implement [sycl_ext_oneapi_device_image_backend_content](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_image_backend_content.asciidoc).
Add autodetection of the CUDA SDK using CMake's builtin `find_package`. I didn't remove the LIT params or handling in lit.py as to not break anyone who may want to use a custom install. Confirmed working [here](https://github.com/intel/llvm/actions/runs/13166519697/job/36748743079?pr=16896) as `Adapters/cuda_queue_priority.cpp` passed. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
…904) Main reason is to avoid having partial specializations for swizzles in the implementation of the traits as we're going to have big changes in that area. Between this change and having to update the traits this approach seemed better.
Also, make kernel_bundle_impl member variable naming consistent.
Follow-up from intel/llvm#16896 Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
…erties (#16728) As the title says, added deprecation messages for such overloads in the `sycl_ext_oneapi_kernel_properties extension`, suggesting users to use `single_task/parallel_for` overloads provided in the `sycl_ext_oneapi_enqueue_functions` extension instead. (As these overloads are to be removed later as mentioned in intel/llvm#14785) Also fixed an affected test case. --------- Signed-off-by: Hu, Peisen <peisen.hu@intel.com>
Add `XFAIL: accelerator` to `OCL_interop_test.cpp` because of [this](https://github.com/intel/llvm/actions/runs/13187415776/job/36814982904) and remove a rogue print statement meant for debugging purposes during development which is most likely unrelated to the failure. An issue for the failure has also been created intel/llvm#16914
Avoid using partial specialization for swizzles as it depends on the swizzles implementation and we'll be making significant changes there to align with the proposed specification changes.
Simplify handling of multiple address spaces and alignment checks. Additional improvement regarding alignment checks is being done here (to perform compile-time alignment check instead of expensive dynamic check): intel/llvm#16882 Also this PR fixes alignment requirement for local address space: 16-byte alignment is required for both load and store.
…913) That would match with pre-existing `check_type_in_v`.
fixes intel/llvm#16693 --------- Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
…some Matrix tests (#16855) Adds `UNSUPPORTED: target-nvidia, target-amd` to matrix tests that also require `aspect-ext_intel_matrix` since these tests are not yet supported for those triples.
1. not treating not having two GPU devices as an error, but skip the test instead. 2. If the two GPU devices are not the same model, skip the test(not our targeted setup).
Signed-off-by: chenwei.sun <chenwei.sun@intel.com>
Signed-off-by: chenwei.sun <chenwei.sun@intel.com>
Signed-off-by: chenwei.sun <chenwei.sun@intel.com>
This pulldown PR is out of date, so close it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The pulldown includes four PRs below to fix IP Leak issues
[AVX10.2] Update convert chapter intrinsic and mnemonics names (#123656)
[X86][AVX10.2-SATCVT][NFC] Remove NE from intrinsic and instruction name (#123275)
[X86][AVX10.2-MINMAX][NFC] Remove NE[P] from intrinsic and instruction (#123272)
[X86][AMX-AVX512][NFC] Remove P from intrinsic and instruction name (#123270)
Signed-off-by: chenwei.sun chenwei.sun@intel.com