-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[LLVM] Fix an ASAN error in lookupLLVMIntrinsicByName
#147444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[LLVM] Fix an ASAN error in lookupLLVMIntrinsicByName
#147444
Conversation
Fix unnecessary conversion of C-String to StringRef in the `Cmp` lambda inside `lookupLLVMIntrinsicByName`. This both fixes an ASAN error in the code that happens when the `Name` StringRef passed in is not a Null terminated StringRef, and additionally should potentially speed up the code as well by eliminating the unnecessary computation of string length every time a C String is converted to StringRef in this code (It seems practically this computation is eliminated in optimized builds, but this will avoid it at O0 builds as well). Added a unit test that demomstrates this issue by building LLVM with these options: ``` -DCMAKE_BUILD_TYPE=Debug -DLLVM_USE_SANITIZER=Address` -DLLVM_OPTIMIZE_SANITIZED_BUILDS=OFF ``` The error reported is as follows: ``` ==462665==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5030000391a2 at pc 0x56525cc30bbf bp 0x7fff9e4ccc60 sp 0x7fff9e4cc428 READ of size 19 at 0x5030000391a2 thread T0 #0 0x56525cc30bbe in strlen (upstream-llvm-second/llvm-project/build/unittests/IR/IRTests+0x713bbe) (BuildId: 0651acf1e582a4d2) llvm#1 0x7f8ff22ad334 in std::char_traits<char>::length(char const*) /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/char_traits.h:399:9 llvm#2 0x7f8ff22a34a0 in llvm::StringRef::StringRef(char const*) /home/rjoshi/upstream-llvm-second/llvm-project/llvm/include/llvm/ADT/StringRef.h:96:33 llvm#3 0x7f8ff28ca184 in _ZZL25lookupLLVMIntrinsicByNameN4llvm8ArrayRefIjEENS_9StringRefES2_ENK3$_0clIjPKcEEDaT_T0_ upstream-llvm-second/llvm-project/llvm/lib/IR/Intrinsics.cpp:673:18 ```
Note, this is a refined version of #145629 |
@llvm/pr-subscribers-llvm-adt @llvm/pr-subscribers-llvm-ir Author: Rahul Joshi (jurahul) ChangesFix unnecessary conversion of C-String to StringRef in the Added a unit test that demonstrates this issue by building LLVM with these options:
The error reported is as follows:
Full diff: https://github.com/llvm/llvm-project/pull/147444.diff 3 Files Affected:
diff --git a/llvm/include/llvm/ADT/StringTable.h b/llvm/include/llvm/ADT/StringTable.h
index b3c4a414ed6b4..c089a070d4b57 100644
--- a/llvm/include/llvm/ADT/StringTable.h
+++ b/llvm/include/llvm/ADT/StringTable.h
@@ -85,14 +85,18 @@ class StringTable {
"Last byte must be a null byte.");
}
- // Get a string from the table starting with the provided offset. The returned
- // `StringRef` is in fact null terminated, and so can be converted safely to a
- // C-string if necessary for a system API.
- constexpr StringRef operator[](Offset O) const {
+ // Returns the raw C string from the table starting with the provided offset.
+ // The returned string is null terminated.
+ constexpr const char *getCString(Offset O) const {
assert(O.value() < Table.size() && "Out of bounds offset!");
return Table.data() + O.value();
}
+ // Get a string from the table starting with the provided offset. The returned
+ // `StringRef` is in fact null terminated, and so can be converted safely to a
+ // C-string if necessary for a system API.
+ constexpr StringRef operator[](Offset O) const { return getCString(O); }
+
/// Returns the byte size of the table.
constexpr size_t size() const { return Table.size(); }
diff --git a/llvm/lib/IR/Intrinsics.cpp b/llvm/lib/IR/Intrinsics.cpp
index e631419d5e1c2..68ab15eab1aea 100644
--- a/llvm/lib/IR/Intrinsics.cpp
+++ b/llvm/lib/IR/Intrinsics.cpp
@@ -660,20 +660,20 @@ static int lookupLLVMIntrinsicByName(ArrayRef<unsigned> NameOffsetTable,
// `equal_range` requires the comparison to work with either side being an
// offset or the value. Detect which kind each side is to set up the
// compared strings.
- StringRef LHSStr;
- if constexpr (std::is_integral_v<decltype(LHS)>) {
- LHSStr = IntrinsicNameTable[LHS];
- } else {
+ const char *LHSStr;
+ if constexpr (std::is_integral_v<decltype(LHS)>)
+ LHSStr = IntrinsicNameTable.getCString(LHS);
+ else
LHSStr = LHS;
- }
- StringRef RHSStr;
- if constexpr (std::is_integral_v<decltype(RHS)>) {
- RHSStr = IntrinsicNameTable[RHS];
- } else {
+
+ const char *RHSStr;
+ if constexpr (std::is_integral_v<decltype(RHS)>)
+ RHSStr = IntrinsicNameTable.getCString(RHS);
+ else
RHSStr = RHS;
- }
- return strncmp(LHSStr.data() + CmpStart, RHSStr.data() + CmpStart,
- CmpEnd - CmpStart) < 0;
+
+ return strncmp(LHSStr + CmpStart, RHSStr + CmpStart, CmpEnd - CmpStart) <
+ 0;
};
LastLow = Low;
std::tie(Low, High) = std::equal_range(Low, High, Name.data(), Cmp);
diff --git a/llvm/unittests/IR/IntrinsicsTest.cpp b/llvm/unittests/IR/IntrinsicsTest.cpp
index ad9af2075e371..fff13d51897f8 100644
--- a/llvm/unittests/IR/IntrinsicsTest.cpp
+++ b/llvm/unittests/IR/IntrinsicsTest.cpp
@@ -80,6 +80,28 @@ TEST(IntrinsicNameLookup, Basic) {
EXPECT_EQ(memcpy_inline, lookupIntrinsicID("llvm.memcpy.inline.p0.p0.i1024"));
}
+TEST(IntrinsicNameLookup, NonNullterminatedStringRef) {
+ using namespace Intrinsic;
+ // This reproduces an issue where lookupIntrinsicID() can access memory beyond
+ // the bounds of the passed in StringRef. For ASAN to catch this as an error,
+ // create a StringRef using heap allocated memory and make it not null
+ // terminated.
+
+ // ASAN will report a "AddressSanitizer: heap-buffer-overflow" error in
+ // `lookupLLVMIntrinsicByName` when LLVM is built with these options:
+ // -DCMAKE_BUILD_TYPE=Debug
+ // -DLLVM_USE_SANITIZER=Address
+ // -DLLVM_OPTIMIZE_SANITIZED_BUILDS=OFF
+
+ // Make an intrinsic name "llvm.memcpy.inline" on the heap.
+ std::string Name = "llvm.memcpy.inline";
+ assert(Name.size() == 18);
+ std::unique_ptr<char[]> Data = std::make_unique<char[]>(Name.size());
+ std::strncpy(Data.get(), Name.data(), Name.size());
+ StringRef S(Data.get(), Name.size());
+ EXPECT_EQ(memcpy_inline, lookupIntrinsicID(S));
+}
+
// Tests to verify getIntrinsicForClangBuiltin.
TEST(IntrinsicNameLookup, ClangBuiltinLookup) {
using namespace Intrinsic;
|
@jfurtek fyi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, LG... not a huge fan of the getCSTring
API, but no strong opinion here.
But good to wait for the other reviewer as well.
I can also do:
without adding the |
Fix unnecessary conversion of C-String to StringRef in the
Cmp
lambda insidelookupLLVMIntrinsicByName
. This both fixes an ASAN error in the code that happens when theName
StringRef passed in is not a Null terminated StringRef, and additionally can potentially speed up the code as well by eliminating the unnecessary computation of string length every time a C String is converted to StringRef in this code (It seems practically this computation is eliminated in optimized builds, but this will avoid it in O0 builds as well).Added a unit test that demonstrates this issue by building LLVM with these options:
The error reported is as follows: