-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[LoadStoreVectorizer] Batch alias analysis results to improve compile time #147555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[LoadStoreVectorizer] Batch alias analysis results to improve compile time #147555
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Drew Kersnar (dakersnar) ChangesThis should be generally good for a lot of LSV cases, but the attached test demonstrates a specific compile time issue that appears in the event where the Without using batching alias analysis, this test takes 6 seconds to compile in a release build. With, less than a second. This is because the mechanism that proves This test only demonstrates the compile time issue if Let me know if there is a better way to represent the compile time unit test. I did not bother running the automatic CHECK generator on this file, as the output of the pass is less important than how long it takes to run. Patch is 115.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147555.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
index 89f63c3b66aad..7b5137b0185ab 100644
--- a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -322,7 +322,8 @@ class Vectorizer {
template <bool IsLoadChain>
bool isSafeToMove(
Instruction *ChainElem, Instruction *ChainBegin,
- const DenseMap<Instruction *, APInt /*OffsetFromLeader*/> &ChainOffsets);
+ const DenseMap<Instruction *, APInt /*OffsetFromLeader*/> &ChainOffsets,
+ BatchAAResults &BatchAA);
/// Merges the equivalence classes if they have underlying objects that differ
/// by one level of indirection (i.e., one is a getelementptr and the other is
@@ -543,6 +544,10 @@ std::vector<Chain> Vectorizer::splitChainByMayAliasInstrs(Chain &C) {
for (const auto &E : C)
ChainOffsets.insert({&*E.Inst, E.OffsetFromLeader});
+ // Across a single invocation of this function the IR is not changing, so
+ // using a batched Alias Analysis is safe and can reduce compile time.
+ BatchAAResults BatchAA(AA);
+
// Loads get hoisted up to the first load in the chain. Stores get sunk
// down to the last store in the chain. Our algorithm for loads is:
//
@@ -569,7 +574,7 @@ std::vector<Chain> Vectorizer::splitChainByMayAliasInstrs(Chain &C) {
NewChain.emplace_back(*ChainBegin);
for (auto ChainIt = std::next(ChainBegin); ChainIt != ChainEnd; ++ChainIt) {
if (isSafeToMove<IsLoad>(ChainIt->Inst, NewChain.front().Inst,
- ChainOffsets)) {
+ ChainOffsets, BatchAA)) {
LLVM_DEBUG(dbgs() << "LSV: No intervening may-alias instrs; can merge "
<< *ChainIt->Inst << " into " << *ChainBegin->Inst
<< "\n");
@@ -999,7 +1004,8 @@ bool Vectorizer::vectorizeChain(Chain &C) {
template <bool IsLoadChain>
bool Vectorizer::isSafeToMove(
Instruction *ChainElem, Instruction *ChainBegin,
- const DenseMap<Instruction *, APInt /*OffsetFromLeader*/> &ChainOffsets) {
+ const DenseMap<Instruction *, APInt /*OffsetFromLeader*/> &ChainOffsets,
+ BatchAAResults &BatchAA) {
LLVM_DEBUG(dbgs() << "LSV: isSafeToMove(" << *ChainElem << " -> "
<< *ChainBegin << ")\n");
@@ -1066,7 +1072,8 @@ bool Vectorizer::isSafeToMove(
LLVM_DEBUG({
// Double check that AA also sees this alias. If not, we probably
// have a bug.
- ModRefInfo MR = AA.getModRefInfo(I, MemoryLocation::get(ChainElem));
+ ModRefInfo MR =
+ BatchAA.getModRefInfo(I, MemoryLocation::get(ChainElem));
assert(IsLoadChain ? isModSet(MR) : isModOrRefSet(MR));
dbgs() << "LSV: Found alias in chain: " << *I << "\n";
});
@@ -1077,7 +1084,7 @@ bool Vectorizer::isSafeToMove(
}
LLVM_DEBUG(dbgs() << "LSV: Querying AA for " << *I << "\n");
- ModRefInfo MR = AA.getModRefInfo(I, MemoryLocation::get(ChainElem));
+ ModRefInfo MR = BatchAA.getModRefInfo(I, MemoryLocation::get(ChainElem));
if (IsLoadChain ? isModSet(MR) : isModOrRefSet(MR)) {
LLVM_DEBUG(dbgs() << "LSV: Found alias in chain:\n"
<< " Aliasing instruction:\n"
diff --git a/llvm/test/Transforms/LoadStoreVectorizer/batch-aa-compile-time.ll b/llvm/test/Transforms/LoadStoreVectorizer/batch-aa-compile-time.ll
new file mode 100644
index 0000000000000..39e5cc56a49e9
--- /dev/null
+++ b/llvm/test/Transforms/LoadStoreVectorizer/batch-aa-compile-time.ll
@@ -0,0 +1,2583 @@
+; RUN: opt -S < %s -passes=load-store-vectorizer --capture-tracking-max-uses-to-explore=1024 | FileCheck %s
+
+; Without using batching alias analysis, this test takes 6 seconds to compile. With, less than a second.
+; This is because the mechanism that proves NoAlias in this case is very expensive (CaptureTracking.cpp),
+; and caching the result leads to 2 calls to that mechanism instead of ~300,000 (run with -stats to see the difference)
+
+; This test only demonstrates the compile time issue if capture-tracking-max-uses-to-explore is set to at least 1024,
+; because with the default value of 100, the CaptureTracking analysis is not run, NoAlias is not proven, and the vectorizer gives up early.
+
+@global_mem = external global i8
+
+define void @compile-time-test() {
+; CHECK-LABEL: define void @compile-time-test() {
+entry:
+ ; Create base pointer to a global variable with the inefficient pattern that Alias Analysis cannot easily traverse through.
+ %global_base_loads = getelementptr i8, ptr inttoptr (i32 ptrtoint (ptr @global_mem to i32) to ptr), i64 0
+
+ ; Create another pointer for the stores.
+ %local_base_stores = alloca <512 x i8>
+
+ ; 512 interwoven loads and stores
+ %ptr_0 = getelementptr i8, ptr %global_base_loads, i64 0
+ %load_0 = load i8, ptr %ptr_0, align 1
+ %ptr2_0 = getelementptr i8, ptr %local_base_stores, i64 0
+ store i8 %load_0, ptr %ptr2_0, align 1
+
+ %ptr_1 = getelementptr i8, ptr %global_base_loads, i64 1
+ %load_1 = load i8, ptr %ptr_1, align 1
+ %ptr2_1 = getelementptr i8, ptr %local_base_stores, i64 1
+ store i8 %load_1, ptr %ptr2_1, align 1
+
+ %ptr_2 = getelementptr i8, ptr %global_base_loads, i64 2
+ %load_2 = load i8, ptr %ptr_2, align 1
+ %ptr2_2 = getelementptr i8, ptr %local_base_stores, i64 2
+ store i8 %load_2, ptr %ptr2_2, align 1
+
+ %ptr_3 = getelementptr i8, ptr %global_base_loads, i64 3
+ %load_3 = load i8, ptr %ptr_3, align 1
+ %ptr2_3 = getelementptr i8, ptr %local_base_stores, i64 3
+ store i8 %load_3, ptr %ptr2_3, align 1
+
+ %ptr_4 = getelementptr i8, ptr %global_base_loads, i64 4
+ %load_4 = load i8, ptr %ptr_4, align 1
+ %ptr2_4 = getelementptr i8, ptr %local_base_stores, i64 4
+ store i8 %load_4, ptr %ptr2_4, align 1
+
+ %ptr_5 = getelementptr i8, ptr %global_base_loads, i64 5
+ %load_5 = load i8, ptr %ptr_5, align 1
+ %ptr2_5 = getelementptr i8, ptr %local_base_stores, i64 5
+ store i8 %load_5, ptr %ptr2_5, align 1
+
+ %ptr_6 = getelementptr i8, ptr %global_base_loads, i64 6
+ %load_6 = load i8, ptr %ptr_6, align 1
+ %ptr2_6 = getelementptr i8, ptr %local_base_stores, i64 6
+ store i8 %load_6, ptr %ptr2_6, align 1
+
+ %ptr_7 = getelementptr i8, ptr %global_base_loads, i64 7
+ %load_7 = load i8, ptr %ptr_7, align 1
+ %ptr2_7 = getelementptr i8, ptr %local_base_stores, i64 7
+ store i8 %load_7, ptr %ptr2_7, align 1
+
+ %ptr_8 = getelementptr i8, ptr %global_base_loads, i64 8
+ %load_8 = load i8, ptr %ptr_8, align 1
+ %ptr2_8 = getelementptr i8, ptr %local_base_stores, i64 8
+ store i8 %load_8, ptr %ptr2_8, align 1
+
+ %ptr_9 = getelementptr i8, ptr %global_base_loads, i64 9
+ %load_9 = load i8, ptr %ptr_9, align 1
+ %ptr2_9 = getelementptr i8, ptr %local_base_stores, i64 9
+ store i8 %load_9, ptr %ptr2_9, align 1
+
+ %ptr_10 = getelementptr i8, ptr %global_base_loads, i64 10
+ %load_10 = load i8, ptr %ptr_10, align 1
+ %ptr2_10 = getelementptr i8, ptr %local_base_stores, i64 10
+ store i8 %load_10, ptr %ptr2_10, align 1
+
+ %ptr_11 = getelementptr i8, ptr %global_base_loads, i64 11
+ %load_11 = load i8, ptr %ptr_11, align 1
+ %ptr2_11 = getelementptr i8, ptr %local_base_stores, i64 11
+ store i8 %load_11, ptr %ptr2_11, align 1
+
+ %ptr_12 = getelementptr i8, ptr %global_base_loads, i64 12
+ %load_12 = load i8, ptr %ptr_12, align 1
+ %ptr2_12 = getelementptr i8, ptr %local_base_stores, i64 12
+ store i8 %load_12, ptr %ptr2_12, align 1
+
+ %ptr_13 = getelementptr i8, ptr %global_base_loads, i64 13
+ %load_13 = load i8, ptr %ptr_13, align 1
+ %ptr2_13 = getelementptr i8, ptr %local_base_stores, i64 13
+ store i8 %load_13, ptr %ptr2_13, align 1
+
+ %ptr_14 = getelementptr i8, ptr %global_base_loads, i64 14
+ %load_14 = load i8, ptr %ptr_14, align 1
+ %ptr2_14 = getelementptr i8, ptr %local_base_stores, i64 14
+ store i8 %load_14, ptr %ptr2_14, align 1
+
+ %ptr_15 = getelementptr i8, ptr %global_base_loads, i64 15
+ %load_15 = load i8, ptr %ptr_15, align 1
+ %ptr2_15 = getelementptr i8, ptr %local_base_stores, i64 15
+ store i8 %load_15, ptr %ptr2_15, align 1
+
+ %ptr_16 = getelementptr i8, ptr %global_base_loads, i64 16
+ %load_16 = load i8, ptr %ptr_16, align 1
+ %ptr2_16 = getelementptr i8, ptr %local_base_stores, i64 16
+ store i8 %load_16, ptr %ptr2_16, align 1
+
+ %ptr_17 = getelementptr i8, ptr %global_base_loads, i64 17
+ %load_17 = load i8, ptr %ptr_17, align 1
+ %ptr2_17 = getelementptr i8, ptr %local_base_stores, i64 17
+ store i8 %load_17, ptr %ptr2_17, align 1
+
+ %ptr_18 = getelementptr i8, ptr %global_base_loads, i64 18
+ %load_18 = load i8, ptr %ptr_18, align 1
+ %ptr2_18 = getelementptr i8, ptr %local_base_stores, i64 18
+ store i8 %load_18, ptr %ptr2_18, align 1
+
+ %ptr_19 = getelementptr i8, ptr %global_base_loads, i64 19
+ %load_19 = load i8, ptr %ptr_19, align 1
+ %ptr2_19 = getelementptr i8, ptr %local_base_stores, i64 19
+ store i8 %load_19, ptr %ptr2_19, align 1
+
+ %ptr_20 = getelementptr i8, ptr %global_base_loads, i64 20
+ %load_20 = load i8, ptr %ptr_20, align 1
+ %ptr2_20 = getelementptr i8, ptr %local_base_stores, i64 20
+ store i8 %load_20, ptr %ptr2_20, align 1
+
+ %ptr_21 = getelementptr i8, ptr %global_base_loads, i64 21
+ %load_21 = load i8, ptr %ptr_21, align 1
+ %ptr2_21 = getelementptr i8, ptr %local_base_stores, i64 21
+ store i8 %load_21, ptr %ptr2_21, align 1
+
+ %ptr_22 = getelementptr i8, ptr %global_base_loads, i64 22
+ %load_22 = load i8, ptr %ptr_22, align 1
+ %ptr2_22 = getelementptr i8, ptr %local_base_stores, i64 22
+ store i8 %load_22, ptr %ptr2_22, align 1
+
+ %ptr_23 = getelementptr i8, ptr %global_base_loads, i64 23
+ %load_23 = load i8, ptr %ptr_23, align 1
+ %ptr2_23 = getelementptr i8, ptr %local_base_stores, i64 23
+ store i8 %load_23, ptr %ptr2_23, align 1
+
+ %ptr_24 = getelementptr i8, ptr %global_base_loads, i64 24
+ %load_24 = load i8, ptr %ptr_24, align 1
+ %ptr2_24 = getelementptr i8, ptr %local_base_stores, i64 24
+ store i8 %load_24, ptr %ptr2_24, align 1
+
+ %ptr_25 = getelementptr i8, ptr %global_base_loads, i64 25
+ %load_25 = load i8, ptr %ptr_25, align 1
+ %ptr2_25 = getelementptr i8, ptr %local_base_stores, i64 25
+ store i8 %load_25, ptr %ptr2_25, align 1
+
+ %ptr_26 = getelementptr i8, ptr %global_base_loads, i64 26
+ %load_26 = load i8, ptr %ptr_26, align 1
+ %ptr2_26 = getelementptr i8, ptr %local_base_stores, i64 26
+ store i8 %load_26, ptr %ptr2_26, align 1
+
+ %ptr_27 = getelementptr i8, ptr %global_base_loads, i64 27
+ %load_27 = load i8, ptr %ptr_27, align 1
+ %ptr2_27 = getelementptr i8, ptr %local_base_stores, i64 27
+ store i8 %load_27, ptr %ptr2_27, align 1
+
+ %ptr_28 = getelementptr i8, ptr %global_base_loads, i64 28
+ %load_28 = load i8, ptr %ptr_28, align 1
+ %ptr2_28 = getelementptr i8, ptr %local_base_stores, i64 28
+ store i8 %load_28, ptr %ptr2_28, align 1
+
+ %ptr_29 = getelementptr i8, ptr %global_base_loads, i64 29
+ %load_29 = load i8, ptr %ptr_29, align 1
+ %ptr2_29 = getelementptr i8, ptr %local_base_stores, i64 29
+ store i8 %load_29, ptr %ptr2_29, align 1
+
+ %ptr_30 = getelementptr i8, ptr %global_base_loads, i64 30
+ %load_30 = load i8, ptr %ptr_30, align 1
+ %ptr2_30 = getelementptr i8, ptr %local_base_stores, i64 30
+ store i8 %load_30, ptr %ptr2_30, align 1
+
+ %ptr_31 = getelementptr i8, ptr %global_base_loads, i64 31
+ %load_31 = load i8, ptr %ptr_31, align 1
+ %ptr2_31 = getelementptr i8, ptr %local_base_stores, i64 31
+ store i8 %load_31, ptr %ptr2_31, align 1
+
+ %ptr_32 = getelementptr i8, ptr %global_base_loads, i64 32
+ %load_32 = load i8, ptr %ptr_32, align 1
+ %ptr2_32 = getelementptr i8, ptr %local_base_stores, i64 32
+ store i8 %load_32, ptr %ptr2_32, align 1
+
+ %ptr_33 = getelementptr i8, ptr %global_base_loads, i64 33
+ %load_33 = load i8, ptr %ptr_33, align 1
+ %ptr2_33 = getelementptr i8, ptr %local_base_stores, i64 33
+ store i8 %load_33, ptr %ptr2_33, align 1
+
+ %ptr_34 = getelementptr i8, ptr %global_base_loads, i64 34
+ %load_34 = load i8, ptr %ptr_34, align 1
+ %ptr2_34 = getelementptr i8, ptr %local_base_stores, i64 34
+ store i8 %load_34, ptr %ptr2_34, align 1
+
+ %ptr_35 = getelementptr i8, ptr %global_base_loads, i64 35
+ %load_35 = load i8, ptr %ptr_35, align 1
+ %ptr2_35 = getelementptr i8, ptr %local_base_stores, i64 35
+ store i8 %load_35, ptr %ptr2_35, align 1
+
+ %ptr_36 = getelementptr i8, ptr %global_base_loads, i64 36
+ %load_36 = load i8, ptr %ptr_36, align 1
+ %ptr2_36 = getelementptr i8, ptr %local_base_stores, i64 36
+ store i8 %load_36, ptr %ptr2_36, align 1
+
+ %ptr_37 = getelementptr i8, ptr %global_base_loads, i64 37
+ %load_37 = load i8, ptr %ptr_37, align 1
+ %ptr2_37 = getelementptr i8, ptr %local_base_stores, i64 37
+ store i8 %load_37, ptr %ptr2_37, align 1
+
+ %ptr_38 = getelementptr i8, ptr %global_base_loads, i64 38
+ %load_38 = load i8, ptr %ptr_38, align 1
+ %ptr2_38 = getelementptr i8, ptr %local_base_stores, i64 38
+ store i8 %load_38, ptr %ptr2_38, align 1
+
+ %ptr_39 = getelementptr i8, ptr %global_base_loads, i64 39
+ %load_39 = load i8, ptr %ptr_39, align 1
+ %ptr2_39 = getelementptr i8, ptr %local_base_stores, i64 39
+ store i8 %load_39, ptr %ptr2_39, align 1
+
+ %ptr_40 = getelementptr i8, ptr %global_base_loads, i64 40
+ %load_40 = load i8, ptr %ptr_40, align 1
+ %ptr2_40 = getelementptr i8, ptr %local_base_stores, i64 40
+ store i8 %load_40, ptr %ptr2_40, align 1
+
+ %ptr_41 = getelementptr i8, ptr %global_base_loads, i64 41
+ %load_41 = load i8, ptr %ptr_41, align 1
+ %ptr2_41 = getelementptr i8, ptr %local_base_stores, i64 41
+ store i8 %load_41, ptr %ptr2_41, align 1
+
+ %ptr_42 = getelementptr i8, ptr %global_base_loads, i64 42
+ %load_42 = load i8, ptr %ptr_42, align 1
+ %ptr2_42 = getelementptr i8, ptr %local_base_stores, i64 42
+ store i8 %load_42, ptr %ptr2_42, align 1
+
+ %ptr_43 = getelementptr i8, ptr %global_base_loads, i64 43
+ %load_43 = load i8, ptr %ptr_43, align 1
+ %ptr2_43 = getelementptr i8, ptr %local_base_stores, i64 43
+ store i8 %load_43, ptr %ptr2_43, align 1
+
+ %ptr_44 = getelementptr i8, ptr %global_base_loads, i64 44
+ %load_44 = load i8, ptr %ptr_44, align 1
+ %ptr2_44 = getelementptr i8, ptr %local_base_stores, i64 44
+ store i8 %load_44, ptr %ptr2_44, align 1
+
+ %ptr_45 = getelementptr i8, ptr %global_base_loads, i64 45
+ %load_45 = load i8, ptr %ptr_45, align 1
+ %ptr2_45 = getelementptr i8, ptr %local_base_stores, i64 45
+ store i8 %load_45, ptr %ptr2_45, align 1
+
+ %ptr_46 = getelementptr i8, ptr %global_base_loads, i64 46
+ %load_46 = load i8, ptr %ptr_46, align 1
+ %ptr2_46 = getelementptr i8, ptr %local_base_stores, i64 46
+ store i8 %load_46, ptr %ptr2_46, align 1
+
+ %ptr_47 = getelementptr i8, ptr %global_base_loads, i64 47
+ %load_47 = load i8, ptr %ptr_47, align 1
+ %ptr2_47 = getelementptr i8, ptr %local_base_stores, i64 47
+ store i8 %load_47, ptr %ptr2_47, align 1
+
+ %ptr_48 = getelementptr i8, ptr %global_base_loads, i64 48
+ %load_48 = load i8, ptr %ptr_48, align 1
+ %ptr2_48 = getelementptr i8, ptr %local_base_stores, i64 48
+ store i8 %load_48, ptr %ptr2_48, align 1
+
+ %ptr_49 = getelementptr i8, ptr %global_base_loads, i64 49
+ %load_49 = load i8, ptr %ptr_49, align 1
+ %ptr2_49 = getelementptr i8, ptr %local_base_stores, i64 49
+ store i8 %load_49, ptr %ptr2_49, align 1
+
+ %ptr_50 = getelementptr i8, ptr %global_base_loads, i64 50
+ %load_50 = load i8, ptr %ptr_50, align 1
+ %ptr2_50 = getelementptr i8, ptr %local_base_stores, i64 50
+ store i8 %load_50, ptr %ptr2_50, align 1
+
+ %ptr_51 = getelementptr i8, ptr %global_base_loads, i64 51
+ %load_51 = load i8, ptr %ptr_51, align 1
+ %ptr2_51 = getelementptr i8, ptr %local_base_stores, i64 51
+ store i8 %load_51, ptr %ptr2_51, align 1
+
+ %ptr_52 = getelementptr i8, ptr %global_base_loads, i64 52
+ %load_52 = load i8, ptr %ptr_52, align 1
+ %ptr2_52 = getelementptr i8, ptr %local_base_stores, i64 52
+ store i8 %load_52, ptr %ptr2_52, align 1
+
+ %ptr_53 = getelementptr i8, ptr %global_base_loads, i64 53
+ %load_53 = load i8, ptr %ptr_53, align 1
+ %ptr2_53 = getelementptr i8, ptr %local_base_stores, i64 53
+ store i8 %load_53, ptr %ptr2_53, align 1
+
+ %ptr_54 = getelementptr i8, ptr %global_base_loads, i64 54
+ %load_54 = load i8, ptr %ptr_54, align 1
+ %ptr2_54 = getelementptr i8, ptr %local_base_stores, i64 54
+ store i8 %load_54, ptr %ptr2_54, align 1
+
+ %ptr_55 = getelementptr i8, ptr %global_base_loads, i64 55
+ %load_55 = load i8, ptr %ptr_55, align 1
+ %ptr2_55 = getelementptr i8, ptr %local_base_stores, i64 55
+ store i8 %load_55, ptr %ptr2_55, align 1
+
+ %ptr_56 = getelementptr i8, ptr %global_base_loads, i64 56
+ %load_56 = load i8, ptr %ptr_56, align 1
+ %ptr2_56 = getelementptr i8, ptr %local_base_stores, i64 56
+ store i8 %load_56, ptr %ptr2_56, align 1
+
+ %ptr_57 = getelementptr i8, ptr %global_base_loads, i64 57
+ %load_57 = load i8, ptr %ptr_57, align 1
+ %ptr2_57 = getelementptr i8, ptr %local_base_stores, i64 57
+ store i8 %load_57, ptr %ptr2_57, align 1
+
+ %ptr_58 = getelementptr i8, ptr %global_base_loads, i64 58
+ %load_58 = load i8, ptr %ptr_58, align 1
+ %ptr2_58 = getelementptr i8, ptr %local_base_stores, i64 58
+ store i8 %load_58, ptr %ptr2_58, align 1
+
+ %ptr_59 = getelementptr i8, ptr %global_base_loads, i64 59
+ %load_59 = load i8, ptr %ptr_59, align 1
+ %ptr2_59 = getelementptr i8, ptr %local_base_stores, i64 59
+ store i8 %load_59, ptr %ptr2_59, align 1
+
+ %ptr_60 = getelementptr i8, ptr %global_base_loads, i64 60
+ %load_60 = load i8, ptr %ptr_60, align 1
+ %ptr2_60 = getelementptr i8, ptr %local_base_stores, i64 60
+ store i8 %load_60, ptr %ptr2_60, align 1
+
+ %ptr_61 = getelementptr i8, ptr %global_base_loads, i64 61
+ %load_61 = load i8, ptr %ptr_61, align 1
+ %ptr2_61 = getelementptr i8, ptr %local_base_stores, i64 61
+ store i8 %load_61, ptr %ptr2_61, align 1
+
+ %ptr_62 = getelementptr i8, ptr %global_base_loads, i64 62
+ %load_62 = load i8, ptr %ptr_62, align 1
+ %ptr2_62 = getelementptr i8, ptr %local_base_stores, i64 62
+ store i8 %load_62, ptr %ptr2_62, align 1
+
+ %ptr_63 = getelementptr i8, ptr %global_base_loads, i64 63
+ %load_63 = load i8, ptr %ptr_63, align 1
+ %ptr2_63 = getelementptr i8, ptr %local_base_stores, i64 63
+ store i8 %load_63, ptr %ptr2_63, align 1
+
+ %ptr_64 = getelementptr i8, ptr %global_base_loads, i64 64
+ %load_64 = load i8, ptr %ptr_64, align 1
+ %ptr2_64 = getelementptr i8, ptr %local_base_stores, i64 64
+ store i8 %load_64, ptr %ptr2_64, align 1
+
+ %ptr_65 = getelementptr i8, ptr %global_base_loads, i64 65
+ %load_65 = load i8, ptr %ptr_65, align 1
+ %ptr2_65 = getelementptr i8, ptr %local_base_stores, i64 65
+ store i8 %load_65, ptr %ptr2_65, align 1
+
+ %ptr_66 = getelementptr i8, ptr %global_base_loads, i64 66
+ %load_66 = load i8, ptr %ptr_66, align 1
+ %ptr2_66 = getelementptr i8, ptr %local_base_stores, i64 66
+ store i8 %load_66, ptr %ptr2_66, align 1
+
+ %ptr_67 = getelementptr i8, ptr %global_base_loads, i64 67
+ %load_67 = load i8, ptr %ptr_67, align 1
+ %ptr2_67 = getelementptr i8, ptr %local_base_stores, i64 67
+ store i8 %load_67, ptr %ptr2_67, align 1
+
+ %ptr_68 = getelementptr i8, ptr %global_base_loads, i64 68
+ %load_68 = load i8, ptr %ptr_68, align 1
+ %ptr2_68 = getelementptr i8, ptr %local_base_stores, i64 68
+ store i8 %load_68, ptr %ptr2_68, align 1
+
+ %ptr_69 = getelementptr i8, ptr %global_base_loads, i64 69
+ %load_69 = load i8, ptr %ptr_69, align 1
+ %ptr2_69 = getelementptr i8, ptr %local_base_stores, i64 69
+ store i8 %load_69, ptr %ptr2_69, align 1
+
+ %ptr_70 = getelementptr i8, ptr %global_base_loads, i64 70
+ %load_70 = load i8, ptr %ptr_70, align 1
+ %ptr2_70 = getelementptr i8, ptr %local_base_stores, i64 70
+ store i8 %load_70, ptr %ptr2_70, align 1
+
+ %ptr_71 = getelementptr i8, ptr %global_base_loads, i64 71
+ %load_71 = load i8, ptr %ptr_71, align 1
+ %ptr2_71 = getelementptr i8, ptr %local_base_stores, i64 71
+ store i8 %load_71, ptr %ptr2_71, align 1
+
+ %ptr_72 = getelementptr i8, ptr %global_base_loads, i64 72
+ %load_72 = load i8, ptr %ptr_72, align 1
+ %ptr2_72 = getelementptr i8, ptr %local_base_stores, i64 72
+ store i8 %load_72, ptr %ptr2_72, align 1
+
+ %ptr_73 ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -543,6 +544,10 @@ std::vector<Chain> Vectorizer::splitChainByMayAliasInstrs(Chain &C) { | |||
for (const auto &E : C) | |||
ChainOffsets.insert({&*E.Inst, E.OffsetFromLeader}); | |||
|
|||
// Across a single invocation of this function the IR is not changing, so | |||
// using a batched Alias Analysis is safe and can reduce compile time. | |||
BatchAAResults BatchAA(AA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good starting point. A possible follow up would be to raise this up to runOnEquivalenceClass(). This includes IR modifications, but this should still be safe due to deferral of instruction erasure via the ToErase set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. I suggest we revisit that idea in a future change, as I have some follow up changes to the LSV and I think it would be best to consider the correctness of expanding the lifetime of this cache only once we have that complete picture.
Unless you can give me more specifics about what exactly counts as invalidating the cache and what would not. The comment above the BatchAAResult class specifies "it is intended to be used only when there are no IR changes inbetween queries." I can see how that restriction could be relaxed under certain conditions, but I don't have a complete understanding of those conditions.
For instance, I have an optimization I am going to propose that fills holes in non-contiguous chains with newly created geps and loads/stores. Would the creation of those instructions invalidate the cache somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. I suggest we revisit that idea in a future change, as I have some follow up changes to the LSV and I think it would be best to consider the correctness of expanding the lifetime of this cache only once we have that complete picture.
I agree that it's best to do this in a future change. This one is trivially correct, the extension is less obvious.
Unless you can give me more specifics about what exactly counts as invalidating the cache and what would not. The comment above the BatchAAResult class specifies "it is intended to be used only when there are no IR changes inbetween queries." I can see how that restriction could be relaxed under certain conditions, but I don't have a complete understanding of those conditions.
For instance, I have an optimization I am going to propose that fills holes in non-contiguous chains with newly created geps and loads/stores. Would the creation of those instructions invalidate the cache somehow?
The requirement for BatchAA is basically "create xor erase". It's okay to create new instructions, and it's okay to erase instructions, but it's not okay to do both, due to address reuse.
This should be generally good for a lot of LSV cases, but the attached test demonstrates a specific compile time issue that appears in the event where the
CaptureTracking
default max uses is raised.Without using batching alias analysis, this test takes 6 seconds to compile in a release build. With, less than a second. This is because the mechanism that proves
NoAlias
in this case is very expensive (CaptureTracking.cpp
), and caching the result leads to 2 calls to that mechanism instead of ~300,000 (run with -stats to see the difference)This test only demonstrates the compile time issue if
capture-tracking-max-uses-to-explore
is set to at least 1024, because with the default value of 100, theCaptureTracking
analysis is not run,NoAlias
is not proven, and the vectorizer gives up early.Let me know if there is a better way to represent the compile time unit test. I did not bother running the automatic CHECK generator on this file, as the output of the pass is less important than how long it takes to run.