Skip to content

[DirectX] Move the scalarizer pass to before dxil-flatten-arrays #146800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: users/Icohedron/pr-146173
Choose a base branch
from

Conversation

Icohedron
Copy link
Contributor

@Icohedron Icohedron commented Jul 2, 2025

Fixes #145924 and #140416
Depends on #146173 being merged first.

This PR moves the scalarizer pass to immediately before the dxil-flatten-arrays pass to allow the dxil-flatten-arrays pass to turn scalar GEPs (including i8 GEPs) into flattened array GEPs where applicable.

A number of LLVM DirectX CodeGen tests have been edited to remove scalar GEPs and also correct previously uncaught incorrectly-transformed GEPs.

No more validation errors of the form Access to out-of-bounds memory is disallowed or TGSM pointers must originate from an unambiguous TGSM global variable appear anymore after this PR when compiling DML shaders.

@llvmbot
Copy link
Member

llvmbot commented Jul 2, 2025

@llvm/pr-subscribers-backend-directx

Author: Deric C. (Icohedron)

Changes

Fixes #145924.
Depends on #146173 being merged first.

This PR moves the scalarizer pass to immediately before the dxil-flatten-arrays pass to allow the dxil-flatten-arrays pass to turn scalar GEPs (including i8 GEPs) into flattened array GEPs where applicable.

A number of LLVM DirectX CodeGen tests have been edited to remove scalar GEPs and also remove previously uncaught out-of-bounds memory accesses.

No more validation errors of the form Access to out-of-bounds memory is disallowed appear anymore after this PR when compiling DML shaders.


Patch is 23.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146800.diff

5 Files Affected:

  • (modified) llvm/lib/Target/DirectX/DXILDataScalarization.cpp (+6-1)
  • (modified) llvm/lib/Target/DirectX/DirectXTargetMachine.cpp (+1-1)
  • (modified) llvm/test/CodeGen/DirectX/llc-pipeline.ll (+3-1)
  • (modified) llvm/test/CodeGen/DirectX/llc-vector-load-scalarize.ll (+128-29)
  • (modified) llvm/test/CodeGen/DirectX/scalar-store.ll (+22-8)
diff --git a/llvm/lib/Target/DirectX/DXILDataScalarization.cpp b/llvm/lib/Target/DirectX/DXILDataScalarization.cpp
index c97c604fdbf77..cfe270d432144 100644
--- a/llvm/lib/Target/DirectX/DXILDataScalarization.cpp
+++ b/llvm/lib/Target/DirectX/DXILDataScalarization.cpp
@@ -302,7 +302,7 @@ bool DataScalarizerVisitor::visitExtractElementInst(ExtractElementInst &EEI) {
 
 bool DataScalarizerVisitor::visitGetElementPtrInst(GetElementPtrInst &GEPI) {
   Value *PtrOperand = GEPI.getPointerOperand();
-  Type *OrigGEPType = GEPI.getPointerOperandType();
+  Type *OrigGEPType = GEPI.getSourceElementType();
   Type *NewGEPType = OrigGEPType;
   bool NeedsTransform = false;
 
@@ -319,6 +319,11 @@ bool DataScalarizerVisitor::visitGetElementPtrInst(GetElementPtrInst &GEPI) {
     }
   }
 
+  // Scalar geps should remain scalars geps. The dxil-flatten-arrays pass will
+  // convert these scalar geps into flattened array geps
+  if (!isa<ArrayType>(OrigGEPType))
+    NewGEPType = OrigGEPType;
+
   // Note: We bail if this isn't a gep touched via alloca or global
   // transformations
   if (!NeedsTransform)
diff --git a/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp b/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
index 40fe6c6e639e4..84751d2db2266 100644
--- a/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
+++ b/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
@@ -107,10 +107,10 @@ class DirectXPassConfig : public TargetPassConfig {
     addPass(createDXILIntrinsicExpansionLegacyPass());
     addPass(createDXILCBufferAccessLegacyPass());
     addPass(createDXILDataScalarizationLegacyPass());
-    addPass(createDXILFlattenArraysLegacyPass());
     ScalarizerPassOptions DxilScalarOptions;
     DxilScalarOptions.ScalarizeLoadStore = true;
     addPass(createScalarizerPass(DxilScalarOptions));
+    addPass(createDXILFlattenArraysLegacyPass());
     addPass(createDXILForwardHandleAccessesLegacyPass());
     addPass(createDXILLegalizeLegacyPass());
     addPass(createDXILResourceImplicitBindingLegacyPass());
diff --git a/llvm/test/CodeGen/DirectX/llc-pipeline.ll b/llvm/test/CodeGen/DirectX/llc-pipeline.ll
index 36fed88fc52d6..151603a7161c5 100644
--- a/llvm/test/CodeGen/DirectX/llc-pipeline.ll
+++ b/llvm/test/CodeGen/DirectX/llc-pipeline.ll
@@ -19,10 +19,12 @@
 ; CHECK-NEXT:   DXIL Intrinsic Expansion
 ; CHECK-NEXT:   DXIL CBuffer Access
 ; CHECK-NEXT:   DXIL Data Scalarization
-; CHECK-NEXT:   DXIL Array Flattener
 ; CHECK-NEXT:   FunctionPass Manager
 ; CHECK-NEXT:     Dominator Tree Construction
 ; CHECK-NEXT:     Scalarize vector operations
+; CHECK-NEXT:   DXIL Array Flattener
+; CHECK-NEXT:   FunctionPass Manager
+; CHECK-NEXT:     Dominator Tree Construction
 ; CHECK-NEXT:     DXIL Forward Handle Accesses
 ; CHECK-NEXT:     DXIL Legalizer
 ; CHECK-NEXT:   DXIL Resource Binding Analysis
diff --git a/llvm/test/CodeGen/DirectX/llc-vector-load-scalarize.ll b/llvm/test/CodeGen/DirectX/llc-vector-load-scalarize.ll
index 78550adbe424a..1e2ff60c56684 100644
--- a/llvm/test/CodeGen/DirectX/llc-vector-load-scalarize.ll
+++ b/llvm/test/CodeGen/DirectX/llc-vector-load-scalarize.ll
@@ -8,30 +8,19 @@
 @staticArrayOfVecData = internal global [3 x <4 x i32>] [<4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32> <i32 9, i32 10, i32 11, i32 12>], align 4
 @"groupshared2dArrayofVectors" = local_unnamed_addr addrspace(3) global [3 x [3 x <4 x i32>]] zeroinitializer, align 16
 
-; CHECK: @arrayofVecData.scalarized.1dim = local_unnamed_addr addrspace(3) global [8 x i32] zeroinitializer, align 16
-; CHECK: @vecData.scalarized = external addrspace(3) global [4 x i32], align 4
-; CHECK: @staticArrayOfVecData.scalarized.1dim = internal global [12 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12], align 4
-; CHECK: @groupshared2dArrayofVectors.scalarized.1dim = local_unnamed_addr addrspace(3) global [36 x i32] zeroinitializer, align 16
 
-; CHECK-NOT: @arrayofVecData
-; CHECK-NOT: @arrayofVecData.scalarized
-; CHECK-NOT: @vecData
-; CHECK-NOT: @staticArrayOfVecData
-; CHECK-NOT: @staticArrayOfVecData.scalarized
-; CHECK-NOT: @groupshared2dArrayofVectors
-; CHECK-NOT: @groupshared2dArrayofVectors.scalarized
 
 define <4 x i32> @load_array_vec_test() #0 {
 ; CHECK-LABEL: define <4 x i32> @load_array_vec_test(
 ; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr addrspace(3) @arrayofVecData.scalarized.1dim, align 4
-; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 1), align 4
-; CHECK-NEXT:    [[TMP6:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 2), align 4
-; CHECK-NEXT:    [[TMP8:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 3), align 4
+; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 1), align 4
+; CHECK-NEXT:    [[TMP6:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 2), align 4
+; CHECK-NEXT:    [[TMP8:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 3), align 4
 ; CHECK-NEXT:    [[TMP10:%.*]] = load i32, ptr addrspace(3) getelementptr inbounds ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 4), align 4
-; CHECK-NEXT:    [[TMP12:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) getelementptr inbounds ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 4), i32 1), align 4
-; CHECK-NEXT:    [[TMP14:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) getelementptr inbounds ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 4), i32 2), align 4
-; CHECK-NEXT:    [[TMP16:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) getelementptr inbounds ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 4), i32 3), align 4
+; CHECK-NEXT:    [[TMP12:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 5), align 4
+; CHECK-NEXT:    [[TMP14:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 6), align 4
+; CHECK-NEXT:    [[TMP16:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 7), align 4
 ; CHECK-NEXT:    [[DOTI05:%.*]] = add i32 [[TMP2]], [[TMP10]]
 ; CHECK-NEXT:    [[DOTI16:%.*]] = add i32 [[TMP4]], [[TMP12]]
 ; CHECK-NEXT:    [[DOTI27:%.*]] = add i32 [[TMP6]], [[TMP14]]
@@ -53,9 +42,9 @@ define <4 x i32> @load_vec_test() #0 {
 ; CHECK-LABEL: define <4 x i32> @load_vec_test(
 ; CHECK-SAME: ) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr addrspace(3) @vecData.scalarized, align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) @vecData.scalarized, i32 1), align 4
-; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) @vecData.scalarized, i32 2), align 4
-; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) @vecData.scalarized, i32 3), align 4
+; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr addrspace(3) getelementptr ([4 x i32], ptr addrspace(3) @vecData.scalarized, i32 0, i32 1), align 4
+; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr addrspace(3) getelementptr ([4 x i32], ptr addrspace(3) @vecData.scalarized, i32 0, i32 2), align 4
+; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr addrspace(3) getelementptr ([4 x i32], ptr addrspace(3) @vecData.scalarized, i32 0, i32 3), align 4
 ; CHECK-NEXT:    [[DOTUPTO0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0
 ; CHECK-NEXT:    [[DOTUPTO1:%.*]] = insertelement <4 x i32> [[DOTUPTO0]], i32 [[TMP2]], i32 1
 ; CHECK-NEXT:    [[DOTUPTO2:%.*]] = insertelement <4 x i32> [[DOTUPTO1]], i32 [[TMP3]], i32 2
@@ -66,6 +55,42 @@ define <4 x i32> @load_vec_test() #0 {
   ret <4 x i32> %1
 }
 
+define <4 x i32> @load_vec_from_scalar_gep_test() #0 {
+; CHECK-LABEL: define <4 x i32> @load_vec_from_scalar_gep_test(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[DOTI04:%.*]] = load i32, ptr addrspace(3) getelementptr inbounds nuw ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 4), align 4
+; CHECK-NEXT:    [[DOTI116:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 5), align 4
+; CHECK-NEXT:    [[DOTI228:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 6), align 4
+; CHECK-NEXT:    [[DOTI3310:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 7), align 4
+; CHECK-NEXT:    [[DOTUPTO011:%.*]] = insertelement <4 x i32> poison, i32 [[DOTI04]], i32 0
+; CHECK-NEXT:    [[DOTUPTO112:%.*]] = insertelement <4 x i32> [[DOTUPTO011]], i32 [[DOTI116]], i32 1
+; CHECK-NEXT:    [[DOTUPTO213:%.*]] = insertelement <4 x i32> [[DOTUPTO112]], i32 [[DOTI228]], i32 2
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i32> [[DOTUPTO213]], i32 [[DOTI3310]], i32 3
+; CHECK-NEXT:    ret <4 x i32> [[TMP1]]
+;
+  %1 = getelementptr inbounds nuw i32, ptr addrspace(3) @"arrayofVecData", i32 4
+  %2 = load <4 x i32>, ptr addrspace(3) %1, align 4
+  ret <4 x i32> %2
+}
+
+define <4 x i32> @load_vec_from_i8_gep_test() #0 {
+; CHECK-LABEL: define <4 x i32> @load_vec_from_i8_gep_test(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[DOTI04:%.*]] = load i32, ptr addrspace(3) getelementptr inbounds nuw ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 4), align 4
+; CHECK-NEXT:    [[DOTI116:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 5), align 4
+; CHECK-NEXT:    [[DOTI228:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 6), align 4
+; CHECK-NEXT:    [[DOTI3310:%.*]] = load i32, ptr addrspace(3) getelementptr ([8 x i32], ptr addrspace(3) @arrayofVecData.scalarized.1dim, i32 0, i32 7), align 4
+; CHECK-NEXT:    [[DOTUPTO011:%.*]] = insertelement <4 x i32> poison, i32 [[DOTI04]], i32 0
+; CHECK-NEXT:    [[DOTUPTO112:%.*]] = insertelement <4 x i32> [[DOTUPTO011]], i32 [[DOTI116]], i32 1
+; CHECK-NEXT:    [[DOTUPTO213:%.*]] = insertelement <4 x i32> [[DOTUPTO112]], i32 [[DOTI228]], i32 2
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i32> [[DOTUPTO213]], i32 [[DOTI3310]], i32 3
+; CHECK-NEXT:    ret <4 x i32> [[TMP1]]
+;
+  %1 = getelementptr inbounds nuw i8, ptr addrspace(3) @"arrayofVecData", i32 16
+  %2 = load <4 x i32>, ptr addrspace(3) %1, align 4
+  ret <4 x i32> %2
+}
+
 define <4 x i32> @load_static_array_of_vec_test(i32 %index) #0 {
 ; CHECK-LABEL: define <4 x i32> @load_static_array_of_vec_test(
 ; CHECK-SAME: i32 [[INDEX:%.*]]) #[[ATTR0]] {
@@ -73,11 +98,17 @@ define <4 x i32> @load_static_array_of_vec_test(i32 %index) #0 {
 ; CHECK-NEXT:    [[TMP2:%.*]] = add i32 0, [[TMP3]]
 ; CHECK-NEXT:    [[DOTFLAT:%.*]] = getelementptr inbounds [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP2]]
 ; CHECK-NEXT:    [[DOTI0:%.*]] = load i32, ptr [[DOTFLAT]], align 4
-; CHECK-NEXT:    [[DOTFLAT_I1:%.*]] = getelementptr i32, ptr [[DOTFLAT]], i32 1
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP5:%.*]] = add i32 1, [[TMP4]]
+; CHECK-NEXT:    [[DOTFLAT_I1:%.*]] = getelementptr [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP5]]
 ; CHECK-NEXT:    [[DOTI1:%.*]] = load i32, ptr [[DOTFLAT_I1]], align 4
-; CHECK-NEXT:    [[DOTFLAT_I2:%.*]] = getelementptr i32, ptr [[DOTFLAT]], i32 2
+; CHECK-NEXT:    [[TMP6:%.*]] = mul i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP7:%.*]] = add i32 2, [[TMP6]]
+; CHECK-NEXT:    [[DOTFLAT_I2:%.*]] = getelementptr [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP7]]
 ; CHECK-NEXT:    [[DOTI2:%.*]] = load i32, ptr [[DOTFLAT_I2]], align 4
-; CHECK-NEXT:    [[DOTFLAT_I3:%.*]] = getelementptr i32, ptr [[DOTFLAT]], i32 3
+; CHECK-NEXT:    [[TMP8:%.*]] = mul i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP9:%.*]] = add i32 3, [[TMP8]]
+; CHECK-NEXT:    [[DOTFLAT_I3:%.*]] = getelementptr [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP9]]
 ; CHECK-NEXT:    [[DOTI3:%.*]] = load i32, ptr [[DOTFLAT_I3]], align 4
 ; CHECK-NEXT:    [[DOTUPTO01:%.*]] = insertelement <4 x i32> poison, i32 [[DOTI0]], i32 0
 ; CHECK-NEXT:    [[DOTUPTO12:%.*]] = insertelement <4 x i32> [[DOTUPTO01]], i32 [[DOTI1]], i32 1
@@ -90,17 +121,85 @@ define <4 x i32> @load_static_array_of_vec_test(i32 %index) #0 {
   ret <4 x i32> %4
 }
 
+define <4 x i32> @load_static_array_of_vec_from_scalar_gep_test(i32 %index) #0 {
+; CHECK-LABEL: define <4 x i32> @load_static_array_of_vec_from_scalar_gep_test(
+; CHECK-SAME: i32 [[INDEX:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP3:%.*]] = add i32 0, [[TMP2]]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP3]]
+; CHECK-NEXT:    [[DOTI0:%.*]] = load i32, ptr [[TMP4]], align 4
+; CHECK-NEXT:    [[TMP5:%.*]] = mul i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP6:%.*]] = add i32 1, [[TMP5]]
+; CHECK-NEXT:    [[DOTI14:%.*]] = getelementptr [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP6]]
+; CHECK-NEXT:    [[DOTI11:%.*]] = load i32, ptr [[DOTI14]], align 4
+; CHECK-NEXT:    [[TMP7:%.*]] = mul i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP8:%.*]] = add i32 2, [[TMP7]]
+; CHECK-NEXT:    [[DOTI25:%.*]] = getelementptr [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP8]]
+; CHECK-NEXT:    [[DOTI22:%.*]] = load i32, ptr [[DOTI25]], align 4
+; CHECK-NEXT:    [[TMP9:%.*]] = mul i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP10:%.*]] = add i32 3, [[TMP9]]
+; CHECK-NEXT:    [[DOTI36:%.*]] = getelementptr [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP10]]
+; CHECK-NEXT:    [[DOTI33:%.*]] = load i32, ptr [[DOTI36]], align 4
+; CHECK-NEXT:    [[DOTUPTO07:%.*]] = insertelement <4 x i32> poison, i32 [[DOTI0]], i32 0
+; CHECK-NEXT:    [[DOTUPTO18:%.*]] = insertelement <4 x i32> [[DOTUPTO07]], i32 [[DOTI11]], i32 1
+; CHECK-NEXT:    [[DOTUPTO29:%.*]] = insertelement <4 x i32> [[DOTUPTO18]], i32 [[DOTI22]], i32 2
+; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[DOTUPTO29]], i32 [[DOTI33]], i32 3
+; CHECK-NEXT:    ret <4 x i32> [[TMP11]]
+;
+  %2 = mul i32 %index, 4
+  %3 = getelementptr inbounds i32, ptr @staticArrayOfVecData, i32 %2
+  %4 = load <4 x i32>, <4 x i32>* %3, align 4
+  ret <4 x i32> %4
+}
+
+define <4 x i32> @load_static_array_of_vec_from_i8_gep_test(i32 %index) #0 {
+; CHECK-LABEL: define <4 x i32> @load_static_array_of_vec_from_i8_gep_test(
+; CHECK-SAME: i32 [[INDEX:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i32 [[INDEX]], 12
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP12:%.*]] = lshr i32 [[TMP2]], 2
+; CHECK-NEXT:    [[TMP3:%.*]] = add i32 0, [[TMP12]]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP3]]
+; CHECK-NEXT:    [[DOTI0:%.*]] = load i32, ptr [[TMP4]], align 4
+; CHECK-NEXT:    [[TMP5:%.*]] = mul i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP14:%.*]] = lshr i32 [[TMP5]], 2
+; CHECK-NEXT:    [[TMP6:%.*]] = add i32 1, [[TMP14]]
+; CHECK-NEXT:    [[DOTI14:%.*]] = getelementptr [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP6]]
+; CHECK-NEXT:    [[DOTI11:%.*]] = load i32, ptr [[DOTI14]], align 4
+; CHECK-NEXT:    [[TMP7:%.*]] = mul i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP15:%.*]] = lshr i32 [[TMP7]], 2
+; CHECK-NEXT:    [[TMP8:%.*]] = add i32 2, [[TMP15]]
+; CHECK-NEXT:    [[DOTI25:%.*]] = getelementptr [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP8]]
+; CHECK-NEXT:    [[DOTI22:%.*]] = load i32, ptr [[DOTI25]], align 4
+; CHECK-NEXT:    [[TMP9:%.*]] = mul i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP13:%.*]] = lshr i32 [[TMP9]], 2
+; CHECK-NEXT:    [[TMP10:%.*]] = add i32 3, [[TMP13]]
+; CHECK-NEXT:    [[DOTI36:%.*]] = getelementptr [12 x i32], ptr @staticArrayOfVecData.scalarized.1dim, i32 0, i32 [[TMP10]]
+; CHECK-NEXT:    [[DOTI33:%.*]] = load i32, ptr [[DOTI36]], align 4
+; CHECK-NEXT:    [[DOTUPTO07:%.*]] = insertelement <4 x i32> poison, i32 [[DOTI0]], i32 0
+; CHECK-NEXT:    [[DOTUPTO18:%.*]] = insertelement <4 x i32> [[DOTUPTO07]], i32 [[DOTI11]], i32 1
+; CHECK-NEXT:    [[DOTUPTO29:%.*]] = insertelement <4 x i32> [[DOTUPTO18]], i32 [[DOTI22]], i32 2
+; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i32> [[DOTUPTO29]], i32 [[DOTI33]], i32 3
+; CHECK-NEXT:    ret <4 x i32> [[TMP11]]
+;
+  %2 = mul i32 %index, 12
+  %3 = getelementptr inbounds i8, ptr @staticArrayOfVecData, i32 %2
+  %4 = load <4 x i32>, <4 x i32>* %3, align 4
+  ret <4 x i32> %4
+}
+
 define <4 x i32> @multid_load_test() #0 {
 ; CHECK-LABEL: define <4 x i32> @multid_load_test(
 ; CHECK-SAME: ) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 1), align 4
-; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 2), align 4
-; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 3), align 4
+; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr addrspace(3) getelementptr ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 1), align 4
+; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr addrspace(3) getelementptr ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 2), align 4
+; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr addrspace(3) getelementptr ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 3), align 4
 ; CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr addrspace(3) getelementptr inbounds ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 16), align 4
-; CHECK-NEXT:    [[DOTI13:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) getelementptr inbounds ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 16), i32 1), align 4
-; CHECK-NEXT:    [[DOTI25:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) getelementptr inbounds ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 16), i32 2), align 4
-; CHECK-NEXT:    [[DOTI37:%.*]] = load i32, ptr addrspace(3) getelementptr (i32, ptr addrspace(3) getelementptr inbounds ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 16), i32 3), align 4
+; CHECK-NEXT:    [[DOTI13:%.*]] = load i32, ptr addrspace(3) getelementptr ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 17), align 4
+; CHECK-NEXT:    [[DOTI25:%.*]] = load i32, ptr addrspace(3) getelementptr ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 18), align 4
+; CHECK-NEXT:    [[DOTI37:%.*]] = load i32, ptr addrspace(3) getelementptr ([36 x i32], ptr addrspace(3) @groupshared2dArrayofVectors.scalarized.1dim, i32 0, i32 19), align 4
 ; CHECK-NEXT:    [[DOTI08:%.*]] = add i32 [[TMP1]], [[TMP5]]
 ; CHECK-NEXT:    [[DOTI19:%.*]] = add i32 [[TMP2]], [[DOTI13]]
 ; CHECK-NEXT:    [[DOTI210:%.*]] = add i32 [[TMP3]], [[DOTI25]]
diff --git a/llvm/test/CodeGen/DirectX/scalar-store.ll b/llvm/test/CodeGen/DirectX/scalar-store.ll
index 7e9fe0e330661..a124c665ad15e 100644
--- a/llvm/test/CodeGen/DirectX/scalar-store.ll
+++ b/llvm/test/CodeGen/Direct...
[truncated]

Copy link
Contributor

@spall spall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit on the PR description. you didn't move the scalarizerPass, you moved the DXILFlattenArrays pass to be immediately after the scalarizerPass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
4 participants