Skip to content

Silence MLIRTargetLLVMTests failures pre-ROCm6.4 #147610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

slinder1
Copy link
Contributor

@slinder1 slinder1 commented Jul 8, 2025

COV6 was made default in 6.4, so check that the ROCm we find isn't older before assuming the tests should pass.

Not sure this is worth the code, and it may make more sense to fail open if e.g. the .info/version file isn't present or doesn't have the version in the expected format. I had already written the code before noting this, so figured I'd just see what others think.

COV6 was made default in 6.4, so check that the ROCm we find isn't older
before assuming the tests should pass.
@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2025

@llvm/pr-subscribers-mlir-llvm

Author: Scott Linder (slinder1)

Changes

COV6 was made default in 6.4, so check that the ROCm we find isn't older before assuming the tests should pass.

Not sure this is worth the code, and it may make more sense to fail open if e.g. the .info/version file isn't present or doesn't have the version in the expected format. I had already written the code before noting this, so figured I'd just see what others think.


Full diff: https://github.com/llvm/llvm-project/pull/147610.diff

1 Files Affected:

  • (modified) mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp (+24-2)
diff --git a/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp b/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp
index a015e1d7dde62..ef314623956d0 100644
--- a/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp
+++ b/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp
@@ -20,6 +20,7 @@
 
 #include "llvm/IRReader/IRReader.h"
 #include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/MemoryBufferRef.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/TargetSelect.h"
@@ -52,6 +53,27 @@ class MLIRTargetLLVMROCDL : public ::testing::Test {
     StringRef rocmPath = ROCDL::getROCMPath();
     if (rocmPath.empty())
       return false;
+    llvm::SmallString<128> rocmVerPath(rocmPath);
+    llvm::sys::path::append(rocmVerPath, ".info", "version");
+    auto bufOrErr = llvm::MemoryBuffer::getFile(rocmVerPath, /*IsText=*/true);
+    if (!bufOrErr)
+      return false;
+    SmallVector<StringRef, 2> majorMinorRest;
+    bufOrErr.get()->getBuffer().split(majorMinorRest, '.', /*MaxSplit=*/2);
+    if (majorMinorRest.size() != 3)
+      return false;
+    auto asInt = [](StringRef s) -> int {
+      unsigned i;
+      if (s.getAsInteger(/*Radix=*/10, i))
+        return -1;
+      return i;
+    };
+    int major = asInt(majorMinorRest[0]);
+    int minor = asInt(majorMinorRest[1]);
+    if (major < 6)
+      return false;
+    if (major == 6 && minor < 4)
+      return false;
     llvm::SmallString<128> lldPath(rocmPath);
     llvm::sys::path::append(lldPath, "llvm", "bin", "ld.lld");
     return llvm::sys::fs::can_execute(lldPath);
@@ -185,7 +207,7 @@ TEST_F(MLIRTargetLLVMROCDL, SKIP_WITHOUT_AMDGPU(SerializeROCDLToPTX)) {
 // Test ROCDL serialization to Binary.
 TEST_F(MLIRTargetLLVMROCDL, SKIP_WITHOUT_AMDGPU(SerializeROCDLToBinary)) {
   if (!hasROCMTools())
-    GTEST_SKIP() << "ROCm installation not found, skipping test.";
+    GTEST_SKIP() << "Compatible ROCm installation not found, skipping test.";
 
   MLIRContext context(registry);
 
@@ -212,7 +234,7 @@ TEST_F(MLIRTargetLLVMROCDL, SKIP_WITHOUT_AMDGPU(SerializeROCDLToBinary)) {
 // Test ROCDL metadata.
 TEST_F(MLIRTargetLLVMROCDL, SKIP_WITHOUT_AMDGPU(GetELFMetadata)) {
   if (!hasROCMTools())
-    GTEST_SKIP() << "ROCm installation not found, skipping test.";
+    GTEST_SKIP() << "Compatible ROCm installation not found, skipping test.";
 
   MLIRContext context(registry);
 

@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2025

@llvm/pr-subscribers-mlir

Author: Scott Linder (slinder1)

Changes

COV6 was made default in 6.4, so check that the ROCm we find isn't older before assuming the tests should pass.

Not sure this is worth the code, and it may make more sense to fail open if e.g. the .info/version file isn't present or doesn't have the version in the expected format. I had already written the code before noting this, so figured I'd just see what others think.


Full diff: https://github.com/llvm/llvm-project/pull/147610.diff

1 Files Affected:

  • (modified) mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp (+24-2)
diff --git a/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp b/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp
index a015e1d7dde62..ef314623956d0 100644
--- a/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp
+++ b/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp
@@ -20,6 +20,7 @@
 
 #include "llvm/IRReader/IRReader.h"
 #include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/MemoryBufferRef.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/TargetSelect.h"
@@ -52,6 +53,27 @@ class MLIRTargetLLVMROCDL : public ::testing::Test {
     StringRef rocmPath = ROCDL::getROCMPath();
     if (rocmPath.empty())
       return false;
+    llvm::SmallString<128> rocmVerPath(rocmPath);
+    llvm::sys::path::append(rocmVerPath, ".info", "version");
+    auto bufOrErr = llvm::MemoryBuffer::getFile(rocmVerPath, /*IsText=*/true);
+    if (!bufOrErr)
+      return false;
+    SmallVector<StringRef, 2> majorMinorRest;
+    bufOrErr.get()->getBuffer().split(majorMinorRest, '.', /*MaxSplit=*/2);
+    if (majorMinorRest.size() != 3)
+      return false;
+    auto asInt = [](StringRef s) -> int {
+      unsigned i;
+      if (s.getAsInteger(/*Radix=*/10, i))
+        return -1;
+      return i;
+    };
+    int major = asInt(majorMinorRest[0]);
+    int minor = asInt(majorMinorRest[1]);
+    if (major < 6)
+      return false;
+    if (major == 6 && minor < 4)
+      return false;
     llvm::SmallString<128> lldPath(rocmPath);
     llvm::sys::path::append(lldPath, "llvm", "bin", "ld.lld");
     return llvm::sys::fs::can_execute(lldPath);
@@ -185,7 +207,7 @@ TEST_F(MLIRTargetLLVMROCDL, SKIP_WITHOUT_AMDGPU(SerializeROCDLToPTX)) {
 // Test ROCDL serialization to Binary.
 TEST_F(MLIRTargetLLVMROCDL, SKIP_WITHOUT_AMDGPU(SerializeROCDLToBinary)) {
   if (!hasROCMTools())
-    GTEST_SKIP() << "ROCm installation not found, skipping test.";
+    GTEST_SKIP() << "Compatible ROCm installation not found, skipping test.";
 
   MLIRContext context(registry);
 
@@ -212,7 +234,7 @@ TEST_F(MLIRTargetLLVMROCDL, SKIP_WITHOUT_AMDGPU(SerializeROCDLToBinary)) {
 // Test ROCDL metadata.
 TEST_F(MLIRTargetLLVMROCDL, SKIP_WITHOUT_AMDGPU(GetELFMetadata)) {
   if (!hasROCMTools())
-    GTEST_SKIP() << "ROCm installation not found, skipping test.";
+    GTEST_SKIP() << "Compatible ROCm installation not found, skipping test.";
 
   MLIRContext context(registry);
 

int minor = asInt(majorMinorRest[1]);
if (major < 6)
return false;
if (major == 6 && minor < 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please add inline comments about what happens here and maybe move this to a separate function. Otherwise, it will be very unclear what's going on here, I fear.

Copy link
Contributor

@fabianmcg fabianmcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking on the merits of the patch. Just preventing merging until we agree on the path forward.

Can you expand a bit more on the issue? For example, what happens if you compile with ROCm 6.4 or earlier?

What about adding a CMake flag like MLIR_DEFAULT_ROCM_ABI=600 controlled by the detected ROCm version and substituting it here rengolin@84cb08e#diff-3da7b8747032bb581f119fd49037d06706443c3c5db56d4d28cc74053a9b754dL189 ?

@slinder1
Copy link
Contributor Author

slinder1 commented Jul 9, 2025

Not blocking on the merits of the patch. Just preventing merging until we agree on the path forward.

Makes sense, and sorry for the half-baked patch, I just realized the path I was on probably wasn't ideal so wanted to get input before continuing

Can you expand a bit more on the issue? For example, what happens if you compile with ROCm 6.4 or earlier?

lld just balks at the ABI version it doesn't understand:

$ cat /opt/rocm/.info/version
6.0.2-115
$ build/tools/mlir/unittests/Target/LLVM/MLIRTargetLLVMTests
[ RUN      ] MLIRTargetLLVMROCDL.SerializeROCDLToBinary
ld.lld: error: unknown abi version:
loc("-":2:7): error: lld invocation failed
/home/slinder1/llvm-project/main/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp:207: Failure
Value of: object != std::nullopt
  Actual: false
Expected: true

[  FAILED  ] MLIRTargetLLVMROCDL.SerializeROCDLToBinary (25 ms)

Note: my patch mistakenly checks for the version where the default switched, not the version when support was added to lld.

What about adding a CMake flag like MLIR_DEFAULT_ROCM_ABI=600 controlled by the detected ROCm version and substituting it here rengolin@84cb08e#diff-3da7b8747032bb581f119fd49037d06706443c3c5db56d4d28cc74053a9b754dL189 ?

Something like this seems reasonable, but technically there are ways to point ROCDL at another ROCM at runtime, right? I suppose in those cases the user just has to set the cmake cache var themselves?

To handle this generally might add more complexity than the use-case warrants (or put another way, I'm not sure there is any use-case to begin with, I just wanted my tests clean on an older box). The failing test could also be desirable considering the dependency on the installed ROCM?

I'm fine with just dropping this after thinking through it more

@fabianmcg
Copy link
Contributor

fabianmcg commented Jul 9, 2025

Something like this seems reasonable, but technically there are ways to point ROCDL at another ROCM at runtime, right?

Yes, but:

Option<"abiVersion", "abi", "std::string",
           /*default=*/"\"600\"",
           "ABI version.">,

is also an option as well, so the user could set abi=500 or abi=600 as a pass option if they need a different value at runtime. The idea of a MLIR_DEFAULT_ROCM_ABI option is to have something that works for CIs and the common case.

I think the patch through CMake should be around a 50 line change or so. And in that case we would end up increasing the support matrix which is always nice.

The failing test could also be desirable considering the dependency on the installed ROCM?

The proper fix in this case is just to pass the correct value of the ABI, not deactivating the test. Which again, shouldn't be that hard, as the ABI value is a configurable value either via a pass otpion or by setting a different value in the attribute.

Just in case, nice catch! This is something that definitely needs a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants