Skip to content

Commit 3033f20

Browse files
authored
[IR] Add llvm.vector.[de]interleave{4,6,8} (llvm#139893)
This adds [de]interleave intrinsics for factors of 4,6,8, so that every interleaved memory operation supported by the in-tree targets can be represented by a single intrinsic. For context, [de]interleaves of fixed-length vectors are represented by a series of shufflevectors. The intrinsics are needed for scalable vectors, and we don't currently scalably vectorize all possible factors of interleave groups supported by RISC-V/AArch64. The underlying reason for this is that higher factors are currently represented by interleaving multiple interleaves themselves, which made sense at the time in the discussion in llvm#89018. But after trying to integrate these for higher factors on RISC-V I think we should revisit this design choice: - Matching these in InterleavedAccessPass is non-trivial: We currently only support factors that are a power of 2, and detecting this requires a good chunk of code - The shufflevector masks used for [de]interleaves of fixed-length vectors are much easier to pattern match as they are strided patterns, but for the intrinsics it's much more complicated to match as the structure is a tree. - Unlike shufflevectors, there's no optimisation that happens on [de]interleave2 intriniscs - For non-power-of-2 factors e.g. 6, there are multiple possible ways a [de]interleave could be represented, see the discussion in llvm#139373 - We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8 we're not really saving much By representing these higher factors are interleaved-interleaves, we can in theory support arbitrarily high interleave factors. However I'm not sure this is actually needed in practice: SVE only has instructions for factors 2,3,4, whilst RVV only supports up to factor 8. This patch would make it much easier to support scalable interleaved accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as the loop vectorizer and InterleavedAccessPass wouldn't need to construct and match trees of interleaves. For interleave factors above 8, for which there are no hardware memory operations to match in the InterleavedAccessPass, we can still keep the wide load + recursive interleaving in the loop vectorizer.
1 parent 841c8d4 commit 3033f20

File tree

9 files changed

+12696
-4397
lines changed

9 files changed

+12696
-4397
lines changed

llvm/docs/LangRef.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20209,7 +20209,7 @@ Arguments:
2020920209

2021020210
The argument to this intrinsic must be a vector.
2021120211

20212-
'``llvm.vector.deinterleave2/3/5/7``' Intrinsic
20212+
'``llvm.vector.deinterleave2/3/4/5/6/7/8``' Intrinsic
2021320213
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2021420214

2021520215
Syntax:
@@ -20227,8 +20227,8 @@ This is an overloaded intrinsic.
2022720227
Overview:
2022820228
"""""""""
2022920229

20230-
The '``llvm.vector.deinterleave2/3/5/7``' intrinsics deinterleave adjacent lanes
20231-
into 2, 3, 5, and 7 separate vectors, respectively, and return them as the
20230+
The '``llvm.vector.deinterleave2/3/4/5/6/7/8``' intrinsics deinterleave adjacent lanes
20231+
into 2 through to 8 separate vectors, respectively, and return them as the
2023220232
result.
2023320233

2023420234
This intrinsic works for both fixed and scalable vectors. While this intrinsic
@@ -20250,7 +20250,7 @@ Arguments:
2025020250
The argument is a vector whose type corresponds to the logical concatenation of
2025120251
the aggregated result types.
2025220252

20253-
'``llvm.vector.interleave2/3/5/7``' Intrinsic
20253+
'``llvm.vector.interleave2/3/4/5/6/7/8``' Intrinsic
2025420254
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025520255

2025620256
Syntax:
@@ -20268,7 +20268,7 @@ This is an overloaded intrinsic.
2026820268
Overview:
2026920269
"""""""""
2027020270

20271-
The '``llvm.vector.interleave2/3/5/7``' intrinsic constructs a vector
20271+
The '``llvm.vector.interleave2/3/4/5/6/7/8``' intrinsic constructs a vector
2027220272
by interleaving all the input vectors.
2027320273

2027420274
This intrinsic works for both fixed and scalable vectors. While this intrinsic

llvm/include/llvm/IR/Intrinsics.h

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -153,8 +153,11 @@ namespace Intrinsic {
153153
TruncArgument,
154154
HalfVecArgument,
155155
OneThirdVecArgument,
156+
OneFourthVecArgument,
156157
OneFifthVecArgument,
158+
OneSixthVecArgument,
157159
OneSeventhVecArgument,
160+
OneEighthVecArgument,
158161
SameVecWidthArgument,
159162
VecOfAnyPtrsToElt,
160163
VecElementArgument,
@@ -166,9 +169,12 @@ namespace Intrinsic {
166169
AArch64Svcount,
167170
} Kind;
168171

169-
// These three have to be contiguous.
170-
static_assert(OneFifthVecArgument == OneThirdVecArgument + 1 &&
171-
OneSeventhVecArgument == OneFifthVecArgument + 1);
172+
// These six have to be contiguous.
173+
static_assert(OneFourthVecArgument == OneThirdVecArgument + 1 &&
174+
OneFifthVecArgument == OneFourthVecArgument + 1 &&
175+
OneSixthVecArgument == OneFifthVecArgument + 1 &&
176+
OneSeventhVecArgument == OneSixthVecArgument + 1 &&
177+
OneEighthVecArgument == OneSeventhVecArgument + 1);
172178
union {
173179
unsigned Integer_Width;
174180
unsigned Float_Width;
@@ -188,19 +194,19 @@ namespace Intrinsic {
188194
unsigned getArgumentNumber() const {
189195
assert(Kind == Argument || Kind == ExtendArgument ||
190196
Kind == TruncArgument || Kind == HalfVecArgument ||
191-
Kind == OneThirdVecArgument || Kind == OneFifthVecArgument ||
192-
Kind == OneSeventhVecArgument || Kind == SameVecWidthArgument ||
193-
Kind == VecElementArgument || Kind == Subdivide2Argument ||
194-
Kind == Subdivide4Argument || Kind == VecOfBitcastsToInt);
197+
(Kind >= OneThirdVecArgument && Kind <= OneEighthVecArgument) ||
198+
Kind == SameVecWidthArgument || Kind == VecElementArgument ||
199+
Kind == Subdivide2Argument || Kind == Subdivide4Argument ||
200+
Kind == VecOfBitcastsToInt);
195201
return Argument_Info >> 3;
196202
}
197203
ArgKind getArgumentKind() const {
198204
assert(Kind == Argument || Kind == ExtendArgument ||
199205
Kind == TruncArgument || Kind == HalfVecArgument ||
200-
Kind == OneThirdVecArgument || Kind == OneFifthVecArgument ||
201-
Kind == OneSeventhVecArgument || Kind == SameVecWidthArgument ||
202-
Kind == VecElementArgument || Kind == Subdivide2Argument ||
203-
Kind == Subdivide4Argument || Kind == VecOfBitcastsToInt);
206+
(Kind >= OneThirdVecArgument && Kind <= OneEighthVecArgument) ||
207+
Kind == SameVecWidthArgument || Kind == VecElementArgument ||
208+
Kind == Subdivide2Argument || Kind == Subdivide4Argument ||
209+
Kind == VecOfBitcastsToInt);
204210
return (ArgKind)(Argument_Info & 7);
205211
}
206212

llvm/include/llvm/IR/Intrinsics.td

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,9 @@ def IIT_ONE_FIFTH_VEC_ARG : IIT_Base<63>;
340340
def IIT_ONE_SEVENTH_VEC_ARG : IIT_Base<64>;
341341
def IIT_V2048: IIT_Vec<2048, 65>;
342342
def IIT_V4096: IIT_Vec<4096, 66>;
343+
def IIT_ONE_FOURTH_VEC_ARG : IIT_Base<67>;
344+
def IIT_ONE_SIXTH_VEC_ARG : IIT_Base<68>;
345+
def IIT_ONE_EIGHTH_VEC_ARG : IIT_Base<69>;
343346
}
344347

345348
defvar IIT_all_FixedTypes = !filter(iit, IIT_all,
@@ -483,12 +486,21 @@ class LLVMHalfElementsVectorType<int num>
483486
class LLVMOneThirdElementsVectorType<int num>
484487
: LLVMMatchType<num, IIT_ONE_THIRD_VEC_ARG>;
485488

489+
class LLVMOneFourthElementsVectorType<int num>
490+
: LLVMMatchType<num, IIT_ONE_FOURTH_VEC_ARG>;
491+
486492
class LLVMOneFifthElementsVectorType<int num>
487493
: LLVMMatchType<num, IIT_ONE_FIFTH_VEC_ARG>;
488494

495+
class LLVMOneSixthElementsVectorType<int num>
496+
: LLVMMatchType<num, IIT_ONE_SIXTH_VEC_ARG>;
497+
489498
class LLVMOneSeventhElementsVectorType<int num>
490499
: LLVMMatchType<num, IIT_ONE_SEVENTH_VEC_ARG>;
491500

501+
class LLVMOneEighthElementsVectorType<int num>
502+
: LLVMMatchType<num, IIT_ONE_EIGHTH_VEC_ARG>;
503+
492504
// Match the type of another intrinsic parameter that is expected to be a
493505
// vector type (i.e. <N x iM>) but with each element subdivided to
494506
// form a vector with more elements that are smaller than the original.
@@ -2782,6 +2794,20 @@ def int_vector_deinterleave3 : DefaultAttrsIntrinsic<[LLVMOneThirdElementsVector
27822794
[llvm_anyvector_ty],
27832795
[IntrNoMem]>;
27842796

2797+
def int_vector_interleave4 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
2798+
[LLVMOneFourthElementsVectorType<0>,
2799+
LLVMOneFourthElementsVectorType<0>,
2800+
LLVMOneFourthElementsVectorType<0>,
2801+
LLVMOneFourthElementsVectorType<0>],
2802+
[IntrNoMem]>;
2803+
2804+
def int_vector_deinterleave4 : DefaultAttrsIntrinsic<[LLVMOneFourthElementsVectorType<0>,
2805+
LLVMOneFourthElementsVectorType<0>,
2806+
LLVMOneFourthElementsVectorType<0>,
2807+
LLVMOneFourthElementsVectorType<0>],
2808+
[llvm_anyvector_ty],
2809+
[IntrNoMem]>;
2810+
27852811
def int_vector_interleave5 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
27862812
[LLVMOneFifthElementsVectorType<0>,
27872813
LLVMOneFifthElementsVectorType<0>,
@@ -2798,6 +2824,24 @@ def int_vector_deinterleave5 : DefaultAttrsIntrinsic<[LLVMOneFifthElementsVector
27982824
[llvm_anyvector_ty],
27992825
[IntrNoMem]>;
28002826

2827+
def int_vector_interleave6 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
2828+
[LLVMOneSixthElementsVectorType<0>,
2829+
LLVMOneSixthElementsVectorType<0>,
2830+
LLVMOneSixthElementsVectorType<0>,
2831+
LLVMOneSixthElementsVectorType<0>,
2832+
LLVMOneSixthElementsVectorType<0>,
2833+
LLVMOneSixthElementsVectorType<0>],
2834+
[IntrNoMem]>;
2835+
2836+
def int_vector_deinterleave6 : DefaultAttrsIntrinsic<[LLVMOneSixthElementsVectorType<0>,
2837+
LLVMOneSixthElementsVectorType<0>,
2838+
LLVMOneSixthElementsVectorType<0>,
2839+
LLVMOneSixthElementsVectorType<0>,
2840+
LLVMOneSixthElementsVectorType<0>,
2841+
LLVMOneSixthElementsVectorType<0>],
2842+
[llvm_anyvector_ty],
2843+
[IntrNoMem]>;
2844+
28012845
def int_vector_interleave7 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
28022846
[LLVMOneSeventhElementsVectorType<0>,
28032847
LLVMOneSeventhElementsVectorType<0>,
@@ -2818,6 +2862,28 @@ def int_vector_deinterleave7 : DefaultAttrsIntrinsic<[LLVMOneSeventhElementsVect
28182862
[llvm_anyvector_ty],
28192863
[IntrNoMem]>;
28202864

2865+
def int_vector_interleave8 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
2866+
[LLVMOneEighthElementsVectorType<0>,
2867+
LLVMOneEighthElementsVectorType<0>,
2868+
LLVMOneEighthElementsVectorType<0>,
2869+
LLVMOneEighthElementsVectorType<0>,
2870+
LLVMOneEighthElementsVectorType<0>,
2871+
LLVMOneEighthElementsVectorType<0>,
2872+
LLVMOneEighthElementsVectorType<0>,
2873+
LLVMOneEighthElementsVectorType<0>],
2874+
[IntrNoMem]>;
2875+
2876+
def int_vector_deinterleave8 : DefaultAttrsIntrinsic<[LLVMOneEighthElementsVectorType<0>,
2877+
LLVMOneEighthElementsVectorType<0>,
2878+
LLVMOneEighthElementsVectorType<0>,
2879+
LLVMOneEighthElementsVectorType<0>,
2880+
LLVMOneEighthElementsVectorType<0>,
2881+
LLVMOneEighthElementsVectorType<0>,
2882+
LLVMOneEighthElementsVectorType<0>,
2883+
LLVMOneEighthElementsVectorType<0>],
2884+
[llvm_anyvector_ty],
2885+
[IntrNoMem]>;
2886+
28212887
//===-------------- Intrinsics to perform partial reduction ---------------===//
28222888

28232889
def int_experimental_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8198,24 +8198,42 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
81988198
case Intrinsic::vector_interleave3:
81998199
visitVectorInterleave(I, 3);
82008200
return;
8201+
case Intrinsic::vector_interleave4:
8202+
visitVectorInterleave(I, 4);
8203+
return;
82018204
case Intrinsic::vector_interleave5:
82028205
visitVectorInterleave(I, 5);
82038206
return;
8207+
case Intrinsic::vector_interleave6:
8208+
visitVectorInterleave(I, 6);
8209+
return;
82048210
case Intrinsic::vector_interleave7:
82058211
visitVectorInterleave(I, 7);
82068212
return;
8213+
case Intrinsic::vector_interleave8:
8214+
visitVectorInterleave(I, 8);
8215+
return;
82078216
case Intrinsic::vector_deinterleave2:
82088217
visitVectorDeinterleave(I, 2);
82098218
return;
82108219
case Intrinsic::vector_deinterleave3:
82118220
visitVectorDeinterleave(I, 3);
82128221
return;
8222+
case Intrinsic::vector_deinterleave4:
8223+
visitVectorDeinterleave(I, 4);
8224+
return;
82138225
case Intrinsic::vector_deinterleave5:
82148226
visitVectorDeinterleave(I, 5);
82158227
return;
8228+
case Intrinsic::vector_deinterleave6:
8229+
visitVectorDeinterleave(I, 6);
8230+
return;
82168231
case Intrinsic::vector_deinterleave7:
82178232
visitVectorDeinterleave(I, 7);
82188233
return;
8234+
case Intrinsic::vector_deinterleave8:
8235+
visitVectorDeinterleave(I, 8);
8236+
return;
82198237
case Intrinsic::experimental_vector_compress:
82208238
setValue(&I, DAG.getNode(ISD::VECTOR_COMPRESS, sdl,
82218239
getValue(I.getArgOperand(0)).getValueType(),

llvm/lib/IR/Intrinsics.cpp

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -378,18 +378,36 @@ DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,
378378
IITDescriptor::get(IITDescriptor::OneThirdVecArgument, ArgInfo));
379379
return;
380380
}
381+
case IIT_ONE_FOURTH_VEC_ARG: {
382+
unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
383+
OutputTable.push_back(
384+
IITDescriptor::get(IITDescriptor::OneFourthVecArgument, ArgInfo));
385+
return;
386+
}
381387
case IIT_ONE_FIFTH_VEC_ARG: {
382388
unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
383389
OutputTable.push_back(
384390
IITDescriptor::get(IITDescriptor::OneFifthVecArgument, ArgInfo));
385391
return;
386392
}
393+
case IIT_ONE_SIXTH_VEC_ARG: {
394+
unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
395+
OutputTable.push_back(
396+
IITDescriptor::get(IITDescriptor::OneSixthVecArgument, ArgInfo));
397+
return;
398+
}
387399
case IIT_ONE_SEVENTH_VEC_ARG: {
388400
unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
389401
OutputTable.push_back(
390402
IITDescriptor::get(IITDescriptor::OneSeventhVecArgument, ArgInfo));
391403
return;
392404
}
405+
case IIT_ONE_EIGHTH_VEC_ARG: {
406+
unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
407+
OutputTable.push_back(
408+
IITDescriptor::get(IITDescriptor::OneEighthVecArgument, ArgInfo));
409+
return;
410+
}
393411
case IIT_SAME_VEC_WIDTH_ARG: {
394412
unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
395413
OutputTable.push_back(
@@ -584,11 +602,14 @@ static Type *DecodeFixedType(ArrayRef<Intrinsic::IITDescriptor> &Infos,
584602
return VectorType::getHalfElementsVectorType(
585603
cast<VectorType>(Tys[D.getArgumentNumber()]));
586604
case IITDescriptor::OneThirdVecArgument:
605+
case IITDescriptor::OneFourthVecArgument:
587606
case IITDescriptor::OneFifthVecArgument:
607+
case IITDescriptor::OneSixthVecArgument:
588608
case IITDescriptor::OneSeventhVecArgument:
609+
case IITDescriptor::OneEighthVecArgument:
589610
return VectorType::getOneNthElementsVectorType(
590611
cast<VectorType>(Tys[D.getArgumentNumber()]),
591-
3 + (D.Kind - IITDescriptor::OneThirdVecArgument) * 2);
612+
3 + (D.Kind - IITDescriptor::OneThirdVecArgument));
592613
case IITDescriptor::SameVecWidthArgument: {
593614
Type *EltTy = DecodeFixedType(Infos, Tys, Context);
594615
Type *Ty = Tys[D.getArgumentNumber()];
@@ -974,15 +995,18 @@ matchIntrinsicType(Type *Ty, ArrayRef<Intrinsic::IITDescriptor> &Infos,
974995
VectorType::getHalfElementsVectorType(
975996
cast<VectorType>(ArgTys[D.getArgumentNumber()])) != Ty;
976997
case IITDescriptor::OneThirdVecArgument:
998+
case IITDescriptor::OneFourthVecArgument:
977999
case IITDescriptor::OneFifthVecArgument:
1000+
case IITDescriptor::OneSixthVecArgument:
9781001
case IITDescriptor::OneSeventhVecArgument:
1002+
case IITDescriptor::OneEighthVecArgument:
9791003
// If this is a forward reference, defer the check for later.
9801004
if (D.getArgumentNumber() >= ArgTys.size())
9811005
return IsDeferredCheck || DeferCheck(Ty);
9821006
return !isa<VectorType>(ArgTys[D.getArgumentNumber()]) ||
9831007
VectorType::getOneNthElementsVectorType(
9841008
cast<VectorType>(ArgTys[D.getArgumentNumber()]),
985-
3 + (D.Kind - IITDescriptor::OneThirdVecArgument) * 2) != Ty;
1009+
3 + (D.Kind - IITDescriptor::OneThirdVecArgument)) != Ty;
9861010
case IITDescriptor::SameVecWidthArgument: {
9871011
if (D.getArgumentNumber() >= ArgTys.size()) {
9881012
// Defer check and subsequent check for the vector element type.

0 commit comments

Comments
 (0)