Skip to content

Commit d515ecc

Browse files
CarolineConcattosdesmalen-arm
authored andcommitted
[IR] Add new intrinsics interleave and deinterleave vectors
This patch adds 2 new intrinsics: ; Interleave two vectors into a wider vector <vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd) ; Deinterleave the odd and even lanes from a wider vector {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec) The main motivator for adding these intrinsics is to support vectorization of complex types using scalable vectors. The intrinsics are kept simple by only supporting a stride of 2, which makes them easy to lower and type-legalize. A stride of 2 is sufficient to handle complex types which only have a real/imaginary component. The format of the intrinsics matches how `shufflevector` is used in LoopVectorize. For example: using cf = std::complex<float>; void foo(cf * dst, int N) { for (int i=0; i<N; ++i) dst[i] += cf(1.f, 2.f); } For this loop, LoopVectorize: (1) Loads a wide vector (e.g. <8 x float>) (2) Extracts odd lanes using shufflevector (leading to <4 x float>) (3) Extracts even lanes using shufflevector (leading to <4 x float>) (4) Performs the addition (5) Interleaves the two <4 x float> vectors into a single <8 x float> using shufflevector (6) Stores the wide vector. In this example, we can 1-1 replace shufflevector in (2) and (3) with the deinterleave intrinsic, and replace the shufflevector in (5) with the interleave intrinsic. The SelectionDAG nodes might be extended to support higher strides (3, 4, etc) as well in the future. Similar to what was done for vector.splice and vector.reverse, the intrinsic is lowered to a shufflevector when the type is fixed width, so to benefit from existing code that was written to recognize/optimize shufflevector patterns. Note that this approach does not prevent us from adding new intrinsics for other strides, or adding a more generic shuffle intrinsic in the future. It just solves the immediate problem of being able to vectorize loops with complex math. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D141924
1 parent 0cbb8ec commit d515ecc

File tree

12 files changed

+838
-0
lines changed

12 files changed

+838
-0
lines changed

llvm/docs/LangRef.rst

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17701,6 +17701,75 @@ Arguments:
1770117701

1770217702
The argument to this intrinsic must be a vector.
1770317703

17704+
'``llvm.experimental.vector.deinterleave2``' Intrinsic
17705+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17706+
17707+
Syntax:
17708+
"""""""
17709+
This is an overloaded intrinsic.
17710+
17711+
::
17712+
17713+
declare {<2 x double>, <2 x double>} @llvm.experimental.vector.deinterleave2.v4f64(<4 x double> %vec1)
17714+
declare {<vscale x 4 x i32>, <vscale x 4 x i32>} @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> %vec1)
17715+
17716+
Overview:
17717+
"""""""""
17718+
17719+
The '``llvm.experimental.vector.deinterleave2``' intrinsic constructs two
17720+
vectors by deinterleaving the even and odd lanes of the input vector.
17721+
17722+
This intrinsic works for both fixed and scalable vectors. While this intrinsic
17723+
supports all vector types the recommended way to express this operation for
17724+
fixed-width vectors is still to use a shufflevector, as that may allow for more
17725+
optimization opportunities.
17726+
17727+
For example:
17728+
17729+
.. code-block:: text
17730+
17731+
{<2 x i64>, <2 x i64>} llvm.experimental.vector.deinterleave2.v4i64(<4 x i64> <i64 0, i64 1, i64 2, i64 3>); ==> {<2 x i64> <i64 0, i64 2>, <2 x i64> <i64 1, i64 3>}
17732+
17733+
Arguments:
17734+
""""""""""
17735+
17736+
The argument is a vector whose type corresponds to the logical concatenation of
17737+
the two result types.
17738+
17739+
'``llvm.experimental.vector.interleave2``' Intrinsic
17740+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17741+
17742+
Syntax:
17743+
"""""""
17744+
This is an overloaded intrinsic.
17745+
17746+
::
17747+
17748+
declare <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double> %vec1, <2 x double> %vec2)
17749+
declare <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2)
17750+
17751+
Overview:
17752+
"""""""""
17753+
17754+
The '``llvm.experimental.vector.interleave2``' intrinsic constructs a vector
17755+
by interleaving two input vectors.
17756+
17757+
This intrinsic works for both fixed and scalable vectors. While this intrinsic
17758+
supports all vector types the recommended way to express this operation for
17759+
fixed-width vectors is still to use a shufflevector, as that may allow for more
17760+
optimization opportunities.
17761+
17762+
For example:
17763+
17764+
.. code-block:: text
17765+
17766+
<4 x i64> llvm.experimental.vector.interleave2.v4i64(<2 x i64> <i64 0, i64 2>, <2 x i64> <i64 1, i64 3>); ==> <4 x i64> <i64 0, i64 1, i64 2, i64 3>
17767+
17768+
Arguments:
17769+
""""""""""
17770+
Both arguments must be vectors of the same type whereby their logical
17771+
concatenation matches the result type.
17772+
1770417773
'``llvm.experimental.vector.splice``' Intrinsic
1770517774
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1770617775

llvm/include/llvm/CodeGen/ISDOpcodes.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -571,6 +571,19 @@ enum NodeType {
571571
/// vector, but not the other way around.
572572
EXTRACT_SUBVECTOR,
573573

574+
/// VECTOR_DEINTERLEAVE(VEC1, VEC2) - Returns two vectors with all input and
575+
/// output vectors having the same type. The first output contains the even
576+
/// indices from CONCAT_VECTORS(VEC1, VEC2), with the second output
577+
/// containing the odd indices. The relative order of elements within an
578+
/// output match that of the concatenated input.
579+
VECTOR_DEINTERLEAVE,
580+
581+
/// VECTOR_INTERLEAVE(VEC1, VEC2) - Returns two vectors with all input and
582+
/// output vectors having the same type. The first output contains the
583+
/// result of interleaving the low half of CONCAT_VECTORS(VEC1, VEC2), with
584+
/// the second output containing the result of interleaving the high half.
585+
VECTOR_INTERLEAVE,
586+
574587
/// VECTOR_REVERSE(VECTOR) - Returns a vector, of the same type as VECTOR,
575588
/// whose elements are shuffled using the following algorithm:
576589
/// RESULT[i] = VECTOR[VECTOR.ElementCount - 1 - i]

llvm/include/llvm/IR/Intrinsics.td

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2121,6 +2121,17 @@ def int_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
21212121
[llvm_anyvector_ty, llvm_i64_ty],
21222122
[IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<1>>]>;
21232123

2124+
2125+
def int_experimental_vector_interleave2 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
2126+
[LLVMHalfElementsVectorType<0>,
2127+
LLVMHalfElementsVectorType<0>],
2128+
[IntrNoMem]>;
2129+
2130+
def int_experimental_vector_deinterleave2 : DefaultAttrsIntrinsic<[LLVMHalfElementsVectorType<0>,
2131+
LLVMHalfElementsVectorType<0>],
2132+
[llvm_anyvector_ty],
2133+
[IntrNoMem]>;
2134+
21242135
//===----------------- Pointer Authentication Intrinsics ------------------===//
21252136
//
21262137

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
#include "llvm/Analysis/MemoryLocation.h"
2828
#include "llvm/Analysis/TargetLibraryInfo.h"
2929
#include "llvm/Analysis/ValueTracking.h"
30+
#include "llvm/Analysis/VectorUtils.h"
3031
#include "llvm/CodeGen/Analysis.h"
3132
#include "llvm/CodeGen/AssignmentTrackingAnalysis.h"
3233
#include "llvm/CodeGen/CodeGenCommonISel.h"
@@ -7321,6 +7322,12 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
73217322
case Intrinsic::callbr_landingpad:
73227323
visitCallBrLandingPad(I);
73237324
return;
7325+
case Intrinsic::experimental_vector_interleave2:
7326+
visitVectorInterleave(I);
7327+
return;
7328+
case Intrinsic::experimental_vector_deinterleave2:
7329+
visitVectorDeinterleave(I);
7330+
return;
73247331
}
73257332
}
73267333

@@ -11551,6 +11558,64 @@ void SelectionDAGBuilder::visitVectorReverse(const CallInst &I) {
1155111558
setValue(&I, DAG.getVectorShuffle(VT, DL, V, DAG.getUNDEF(VT), Mask));
1155211559
}
1155311560

11561+
void SelectionDAGBuilder::visitVectorDeinterleave(const CallInst &I) {
11562+
auto DL = getCurSDLoc();
11563+
SDValue InVec = getValue(I.getOperand(0));
11564+
EVT OutVT =
11565+
InVec.getValueType().getHalfNumVectorElementsVT(*DAG.getContext());
11566+
11567+
unsigned OutNumElts = OutVT.getVectorMinNumElements();
11568+
11569+
// ISD Node needs the input vectors split into two equal parts
11570+
SDValue Lo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
11571+
DAG.getVectorIdxConstant(0, DL));
11572+
SDValue Hi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
11573+
DAG.getVectorIdxConstant(OutNumElts, DL));
11574+
11575+
// Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing
11576+
// legalisation and combines.
11577+
if (OutVT.isFixedLengthVector()) {
11578+
SDValue Even = DAG.getVectorShuffle(OutVT, DL, Lo, Hi,
11579+
createStrideMask(0, 2, OutNumElts));
11580+
SDValue Odd = DAG.getVectorShuffle(OutVT, DL, Lo, Hi,
11581+
createStrideMask(1, 2, OutNumElts));
11582+
SDValue Res = DAG.getMergeValues({Even, Odd}, getCurSDLoc());
11583+
setValue(&I, Res);
11584+
return;
11585+
}
11586+
11587+
SDValue Res = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL,
11588+
DAG.getVTList(OutVT, OutVT), Lo, Hi);
11589+
setValue(&I, Res);
11590+
return;
11591+
}
11592+
11593+
void SelectionDAGBuilder::visitVectorInterleave(const CallInst &I) {
11594+
auto DL = getCurSDLoc();
11595+
EVT InVT = getValue(I.getOperand(0)).getValueType();
11596+
SDValue InVec0 = getValue(I.getOperand(0));
11597+
SDValue InVec1 = getValue(I.getOperand(1));
11598+
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
11599+
EVT OutVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
11600+
11601+
// Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing
11602+
// legalisation and combines.
11603+
if (OutVT.isFixedLengthVector()) {
11604+
unsigned NumElts = InVT.getVectorMinNumElements();
11605+
SDValue V = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, InVec0, InVec1);
11606+
setValue(&I, DAG.getVectorShuffle(OutVT, DL, V, DAG.getUNDEF(OutVT),
11607+
createInterleaveMask(NumElts, 2)));
11608+
return;
11609+
}
11610+
11611+
SDValue Res = DAG.getNode(ISD::VECTOR_INTERLEAVE, DL,
11612+
DAG.getVTList(InVT, InVT), InVec0, InVec1);
11613+
Res = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, Res.getValue(0),
11614+
Res.getValue(1));
11615+
setValue(&I, Res);
11616+
return;
11617+
}
11618+
1155411619
void SelectionDAGBuilder::visitFreeze(const FreezeInst &I) {
1155511620
SmallVector<EVT, 4> ValueVTs;
1155611621
ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(), I.getType(),

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -649,6 +649,8 @@ class SelectionDAGBuilder {
649649
void visitVectorReduce(const CallInst &I, unsigned Intrinsic);
650650
void visitVectorReverse(const CallInst &I);
651651
void visitVectorSplice(const CallInst &I);
652+
void visitVectorInterleave(const CallInst &I);
653+
void visitVectorDeinterleave(const CallInst &I);
652654
void visitStepVector(const CallInst &I);
653655

654656
void visitUserOp1(const Instruction &I) {

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,8 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
297297
case ISD::CONCAT_VECTORS: return "concat_vectors";
298298
case ISD::INSERT_SUBVECTOR: return "insert_subvector";
299299
case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";
300+
case ISD::VECTOR_DEINTERLEAVE: return "vector_deinterleave";
301+
case ISD::VECTOR_INTERLEAVE: return "vector_interleave";
300302
case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";
301303
case ISD::VECTOR_SHUFFLE: return "vector_shuffle";
302304
case ISD::VECTOR_SPLICE: return "vector_splice";

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1228,6 +1228,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
12281228
{MVT::nxv16i1, MVT::nxv8i1, MVT::nxv4i1, MVT::nxv2i1, MVT::nxv1i1}) {
12291229
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
12301230
setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
1231+
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
1232+
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
12311233
}
12321234
}
12331235

@@ -1273,6 +1275,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
12731275
setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);
12741276
setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
12751277
setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
1278+
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
1279+
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
12761280

12771281
setOperationAction(ISD::UMUL_LOHI, VT, Expand);
12781282
setOperationAction(ISD::SMUL_LOHI, VT, Expand);
@@ -1414,6 +1418,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
14141418
setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom);
14151419
setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);
14161420
setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);
1421+
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
1422+
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
14171423

14181424
setOperationAction(ISD::SELECT_CC, VT, Expand);
14191425
setOperationAction(ISD::FREM, VT, Expand);
@@ -6106,6 +6112,10 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
61066112
return LowerCTTZ(Op, DAG);
61076113
case ISD::VECTOR_SPLICE:
61086114
return LowerVECTOR_SPLICE(Op, DAG);
6115+
case ISD::VECTOR_DEINTERLEAVE:
6116+
return LowerVECTOR_DEINTERLEAVE(Op, DAG);
6117+
case ISD::VECTOR_INTERLEAVE:
6118+
return LowerVECTOR_INTERLEAVE(Op, DAG);
61096119
case ISD::STRICT_LROUND:
61106120
case ISD::STRICT_LLROUND:
61116121
case ISD::STRICT_LRINT:
@@ -24051,6 +24061,34 @@ AArch64TargetLowering::LowerFixedLengthIntToFPToSVE(SDValue Op,
2405124061
}
2405224062
}
2405324063

24064+
SDValue
24065+
AArch64TargetLowering::LowerVECTOR_DEINTERLEAVE(SDValue Op,
24066+
SelectionDAG &DAG) const {
24067+
SDLoc DL(Op);
24068+
EVT OpVT = Op.getValueType();
24069+
assert(OpVT.isScalableVector() &&
24070+
"Expected scalable vector in LowerVECTOR_DEINTERLEAVE.");
24071+
SDValue Even = DAG.getNode(AArch64ISD::UZP1, DL, OpVT, Op.getOperand(0),
24072+
Op.getOperand(1));
24073+
SDValue Odd = DAG.getNode(AArch64ISD::UZP2, DL, OpVT, Op.getOperand(0),
24074+
Op.getOperand(1));
24075+
return DAG.getMergeValues({Even, Odd}, DL);
24076+
}
24077+
24078+
SDValue AArch64TargetLowering::LowerVECTOR_INTERLEAVE(SDValue Op,
24079+
SelectionDAG &DAG) const {
24080+
SDLoc DL(Op);
24081+
EVT OpVT = Op.getValueType();
24082+
assert(OpVT.isScalableVector() &&
24083+
"Expected scalable vector in LowerVECTOR_INTERLEAVE.");
24084+
24085+
SDValue Lo = DAG.getNode(AArch64ISD::ZIP1, DL, OpVT, Op.getOperand(0),
24086+
Op.getOperand(1));
24087+
SDValue Hi = DAG.getNode(AArch64ISD::ZIP2, DL, OpVT, Op.getOperand(0),
24088+
Op.getOperand(1));
24089+
return DAG.getMergeValues({Lo, Hi}, DL);
24090+
}
24091+
2405424092
SDValue
2405524093
AArch64TargetLowering::LowerFixedLengthFPToIntToSVE(SDValue Op,
2405624094
SelectionDAG &DAG) const {

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1055,6 +1055,8 @@ class AArch64TargetLowering : public TargetLowering {
10551055
SDValue LowerVECTOR_SPLICE(SDValue Op, SelectionDAG &DAG) const;
10561056
SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
10571057
SDValue LowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
1058+
SDValue LowerVECTOR_DEINTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
1059+
SDValue LowerVECTOR_INTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
10581060
SDValue LowerDIV(SDValue Op, SelectionDAG &DAG) const;
10591061
SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) const;
10601062
SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const;

0 commit comments

Comments
 (0)