Skip to content

Commit 1fb415f

Browse files
committed
[AMDGPU][FIX] Proper load-store-vectorizer result with opaque pointers
The original code relied on the fact that we needed a bitcast instruction (for non constant base objects). With opaque pointers there might not be a bitcast. Always check if reordering is required instead. Fixes: llvm/llvm-project#54896 Differential Revision: https://reviews.llvm.org/D123694
1 parent 9a8bb4b commit 1fb415f

File tree

2 files changed

+34
-4
lines changed

2 files changed

+34
-4
lines changed

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1297,10 +1297,16 @@ bool Vectorizer::vectorizeLoadChain(
12971297
CV->replaceAllUsesWith(V);
12981298
}
12991299

1300-
// Bitcast might not be an Instruction, if the value being loaded is a
1301-
// constant. In that case, no need to reorder anything.
1302-
if (Instruction *BitcastInst = dyn_cast<Instruction>(Bitcast))
1303-
reorder(BitcastInst);
1300+
// Since we might have opaque pointers we might end up using the pointer
1301+
// operand of the first load (wrt. memory loaded) for the vector load. Since
1302+
// this first load might not be the first in the block we potentially need to
1303+
// reorder the pointer operand (and its operands). If we have a bitcast though
1304+
// it might be before the load and should be the reorder start instruction.
1305+
// "Might" because for opaque pointers the "bitcast" is just the first loads
1306+
// pointer operand, as oppposed to something we inserted at the right position
1307+
// ourselves.
1308+
Instruction *BCInst = dyn_cast<Instruction>(Bitcast);
1309+
reorder((BCInst && BCInst != L0->getPointerOperand()) ? BCInst : LI);
13041310

13051311
eraseInstructions(Chain);
13061312

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
2+
; RUN: opt -mtriple=amdgcn-amd-amdhsa -basic-aa -load-store-vectorizer -S -o - %s | FileCheck %s
3+
4+
; Vectorize and emit valid code (Issue #54896).
5+
6+
%S = type { i64, i64 }
7+
@S = external global %S
8+
9+
define i64 @order() {
10+
; CHECK-LABEL: @order(
11+
; CHECK-NEXT: [[IDX0:%.*]] = getelementptr inbounds [[S:%.*]], ptr @S, i32 0, i32 0
12+
; CHECK-NEXT: [[TMP1:%.*]] = load <2 x i64>, ptr [[IDX0]], align 8
13+
; CHECK-NEXT: [[L01:%.*]] = extractelement <2 x i64> [[TMP1]], i32 0
14+
; CHECK-NEXT: [[L12:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
15+
; CHECK-NEXT: [[ADD:%.*]] = add i64 [[L01]], [[L12]]
16+
; CHECK-NEXT: ret i64 [[ADD]]
17+
;
18+
%idx1 = getelementptr inbounds %S, ptr @S, i32 0, i32 1
19+
%l1 = load i64, i64* %idx1, align 8
20+
%idx0 = getelementptr inbounds %S, ptr @S, i32 0, i32 0
21+
%l0 = load i64, i64* %idx0, align 8
22+
%add = add i64 %l0, %l1
23+
ret i64 %add
24+
}

0 commit comments

Comments
 (0)