Skip to content

Fix condition to ignore failed LLVM GPU tests #3819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 2, 2025

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jul 2, 2025

The tests should always be ignored if ptxas or the ROCm dependency is not present independently of whether the corresponding target is in the build targets.
At worst this will exclude too many tests but it seems the condition to run those tests is independent of whether this specific target is build.

For 18.x we have LIBOMPTARGET_BUILD_CUDA_PLUGIN default to true and enabling the tests based on presence of GPUs (determined by running nvptx-arch)

For 19+ it seems to be similar but based on LIBOMPTARGET_PLUGINS_TO_BUILD which wasn't set for 18 until #3755 and hence defaulted to "all", enabling CUDA and hence the tests

@Thyre @Crivella Should we, instead of removing self.nvptx_target_cond rather check for not self.nvptx_target_cond or ...?

The tests should always be ignored if `ptxas` or the ROCm dependency is
not present independently of whether the corresponding target is in the
build targets.
At worst this will exclude too many tests but it seems the condition to
run those tests is independent of whether this specific target is build.
@Flamefire Flamefire marked this pull request as ready for review July 2, 2025 07:55
@Thyre
Copy link
Contributor

Thyre commented Jul 2, 2025

I'm fine with just always adding the ignore patterns, if we don't find ptxas / rocr-runtime.
We could bump the former to always ignore the patterns if CUDA is not an explicit dependency, but a system CUDA might work fine, whereas chances are higher that a system ROCm does not.

We need to change the rocr-runtime check at some point once we start integrating ROCm into EasyBuild, but we're not there yet.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

  • SUCCESS LLVM-19.1.7-GCCcore-13.3.0.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
c49 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/3d428f3ca70885056d05cff719b5f4af for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
n1450 - Linux RHEL 8.9 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (sapphirerapids), Python 3.9.18
See https://gist.github.com/Flamefire/bee8016c7b8b96cd348ab4c1707fb70f for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
n1682 - Linux RHEL 8.9 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (sapphirerapids), Python 3.9.18
See https://gist.github.com/Flamefire/d039481fa34205479e60cc793416019a for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

  • SUCCESS LLVM-18.1.8-GCCcore-13.3.0.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
c38 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/3b3736225ef7495b2cc49a373aa2374e for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

  • SUCCESS LLVM-20.1.5-GCCcore-13.3.0.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
c51 - Linux AlmaLinux 9.4, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 560.35.03, Python 3.9.18
See https://gist.github.com/Flamefire/0819457f1f3a3f95a02b8198530ff6ea for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

  • SUCCESS LLVM-19.1.7-GCCcore-13.3.0.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
i7087 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.9.18
See https://gist.github.com/Flamefire/414ab1f63b31a615c1668c250d3315d5 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

  • SUCCESS LLVM-18.1.8-GCCcore-13.3.0.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
i7090 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.9.18
See https://gist.github.com/Flamefire/44ba95c71495853c26ee2008aa355b0e for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

  • SUCCESS LLVM-20.1.5-GCCcore-13.3.0.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
i7088 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.9.18
See https://gist.github.com/Flamefire/f4141939e9e5ac3ba83fb816e8638b42 for a full test report.

@boegel boegel added the bug fix label Jul 2, 2025
@boegel boegel added this to the 5.1.1 milestone Jul 2, 2025
@boegel boegel merged commit 0df8bf4 into easybuilders:develop Jul 2, 2025
17 checks passed
@Flamefire Flamefire deleted the llvm-test-exclusions branch July 2, 2025 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants