Skip to content

Commit 3094f96

Browse files
Lorenzo AlbanoSimon Moll
authored andcommitted
[VP] Strided loads/stores
This patch introduces two new experimental IR intrinsics and SDAG nodes to represent vector strided loads and stores. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D114884
1 parent 3986590 commit 3094f96

File tree

11 files changed

+681
-18
lines changed

11 files changed

+681
-18
lines changed

llvm/docs/LangRef.rst

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19838,6 +19838,126 @@ Examples:
1983819838
call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> %val, <8 x i8>* %ptr, i32 4, <8 x i1> %mask)
1983919839

1984019840

19841+
.. _int_experimental_vp_strided_load:
19842+
19843+
'``llvm.experimental.vp.strided.load``' Intrinsic
19844+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19845+
19846+
Syntax:
19847+
"""""""
19848+
This is an overloaded intrinsic.
19849+
19850+
::
19851+
19852+
declare <4 x float> @llvm.experimental.vp.strided.load.v4f32.i64(float* %ptr, i64 %stride, <4 x i1> %mask, i32 %evl)
19853+
declare <vscale x 2 x i16> @llvm.experimental.vp.strided.load.nxv2i16.i64(i16* %ptr, i64 %stride, <vscale x 2 x i1> %mask, i32 %evl)
19854+
19855+
Overview:
19856+
"""""""""
19857+
19858+
The '``llvm.experimental.vp.strided.load``' intrinsic loads, into a vector, scalar values from
19859+
memory locations evenly spaced apart by '``stride``' number of bytes, starting from '``ptr``'.
19860+
19861+
Arguments:
19862+
""""""""""
19863+
19864+
The first operand is the base pointer for the load. The second operand is the stride
19865+
value expressed in bytes. The third operand is a vector of boolean values
19866+
with the same number of elements as the return type. The fourth is the explicit
19867+
vector length of the operation. The base pointer underlying type matches the type of the scalar
19868+
elements of the return operand.
19869+
19870+
The :ref:`align <attr_align>` parameter attribute can be provided for the first
19871+
operand.
19872+
19873+
Semantics:
19874+
""""""""""
19875+
19876+
The '``llvm.experimental.vp.strided.load``' intrinsic loads, into a vector, multiple scalar
19877+
values from memory in the same way as the :ref:`llvm.vp.gather <int_vp_gather>` intrinsic,
19878+
where the vector of pointers is in the form:
19879+
19880+
``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
19881+
19882+
with '``ptr``' previously casted to a pointer '``i8``', '``stride``' always interpreted as a signed
19883+
integer and all arithmetic occurring in the pointer type.
19884+
19885+
Examples:
19886+
"""""""""
19887+
19888+
.. code-block:: text
19889+
19890+
%r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl)
19891+
;; The operation can also be expressed like this:
19892+
19893+
%addr = bitcast i64* %ptr to i8*
19894+
;; Create a vector of pointers %addrs in the form:
19895+
;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
19896+
%ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
19897+
%also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)
19898+
19899+
19900+
.. _int_experimental_vp_strided_store:
19901+
19902+
'``llvm.experimental.vp.strided.store``' Intrinsic
19903+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19904+
19905+
Syntax:
19906+
"""""""
19907+
This is an overloaded intrinsic.
19908+
19909+
::
19910+
19911+
declare void @llvm.experimental.vp.strided.store.v4f32.i64(<4 x float> %val, float* %ptr, i64 %stride, <4 x i1> %mask, i32 %evl)
19912+
declare void @llvm.experimental.vp.strided.store.nxv2i16.i64(<vscale x 2 x i16> %val, i16* %ptr, i64 %stride, <vscale x 2 x i1> %mask, i32 %evl)
19913+
19914+
Overview:
19915+
"""""""""
19916+
19917+
The '``@llvm.experimental.vp.strided.store``' intrinsic stores the elements of
19918+
'``val``' into memory locations evenly spaced apart by '``stride``' number of
19919+
bytes, starting from '``ptr``'.
19920+
19921+
Arguments:
19922+
""""""""""
19923+
19924+
The first operand is the vector value to be written to memory. The second
19925+
operand is the base pointer for the store. Its underlying type matches the
19926+
scalar element type of the value operand. The third operand is the stride value
19927+
expressed in bytes. The fourth operand is a vector of boolean values with the
19928+
same number of elements as the return type. The fifth is the explicit vector
19929+
length of the operation.
19930+
19931+
The :ref:`align <attr_align>` parameter attribute can be provided for the
19932+
second operand.
19933+
19934+
Semantics:
19935+
""""""""""
19936+
19937+
The '``llvm.experimental.vp.strided.store``' intrinsic stores the elements of
19938+
'``val``' in the same way as the :ref:`llvm.vp.scatter <int_vp_scatter>` intrinsic,
19939+
where the vector of pointers is in the form:
19940+
19941+
``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
19942+
19943+
with '``ptr``' previously casted to a pointer '``i8``', '``stride``' always interpreted as a signed
19944+
integer and all arithmetic occurring in the pointer type.
19945+
19946+
Examples:
19947+
"""""""""
19948+
19949+
.. code-block:: text
19950+
19951+
call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl)
19952+
;; The operation can also be expressed like this:
19953+
19954+
%addr = bitcast i64* %ptr to i8*
19955+
;; Create a vector of pointers %addrs in the form:
19956+
;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
19957+
%ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
19958+
call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)
19959+
19960+
1984119961
.. _int_vp_gather:
1984219962

1984319963
'``llvm.vp.gather``' Intrinsic

llvm/include/llvm/CodeGen/SelectionDAG.h

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1378,6 +1378,77 @@ class SelectionDAG {
13781378
SDValue getIndexedStoreVP(SDValue OrigStore, const SDLoc &dl, SDValue Base,
13791379
SDValue Offset, ISD::MemIndexedMode AM);
13801380

1381+
SDValue getStridedLoadVP(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType,
1382+
EVT VT, const SDLoc &DL, SDValue Chain, SDValue Ptr,
1383+
SDValue Offset, SDValue Stride, SDValue Mask,
1384+
SDValue EVL, MachinePointerInfo PtrInfo, EVT MemVT,
1385+
Align Alignment, MachineMemOperand::Flags MMOFlags,
1386+
const AAMDNodes &AAInfo,
1387+
const MDNode *Ranges = nullptr,
1388+
bool IsExpanding = false);
1389+
inline SDValue getStridedLoadVP(
1390+
ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT, const SDLoc &DL,
1391+
SDValue Chain, SDValue Ptr, SDValue Offset, SDValue Stride, SDValue Mask,
1392+
SDValue EVL, MachinePointerInfo PtrInfo, EVT MemVT,
1393+
MaybeAlign Alignment = MaybeAlign(),
1394+
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
1395+
const AAMDNodes &AAInfo = AAMDNodes(), const MDNode *Ranges = nullptr,
1396+
bool IsExpanding = false) {
1397+
// Ensures that codegen never sees a None Alignment.
1398+
return getStridedLoadVP(AM, ExtType, VT, DL, Chain, Ptr, Offset, Stride,
1399+
Mask, EVL, PtrInfo, MemVT,
1400+
Alignment.getValueOr(getEVTAlign(MemVT)), MMOFlags,
1401+
AAInfo, Ranges, IsExpanding);
1402+
}
1403+
SDValue getStridedLoadVP(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType,
1404+
EVT VT, const SDLoc &DL, SDValue Chain, SDValue Ptr,
1405+
SDValue Offset, SDValue Stride, SDValue Mask,
1406+
SDValue EVL, EVT MemVT, MachineMemOperand *MMO,
1407+
bool IsExpanding = false);
1408+
SDValue getStridedLoadVP(EVT VT, const SDLoc &DL, SDValue Chain, SDValue Ptr,
1409+
SDValue Stride, SDValue Mask, SDValue EVL,
1410+
MachinePointerInfo PtrInfo, MaybeAlign Alignment,
1411+
MachineMemOperand::Flags MMOFlags,
1412+
const AAMDNodes &AAInfo,
1413+
const MDNode *Ranges = nullptr,
1414+
bool IsExpanding = false);
1415+
SDValue getStridedLoadVP(EVT VT, const SDLoc &DL, SDValue Chain, SDValue Ptr,
1416+
SDValue Stride, SDValue Mask, SDValue EVL,
1417+
MachineMemOperand *MMO, bool IsExpanding = false);
1418+
SDValue
1419+
getExtStridedLoadVP(ISD::LoadExtType ExtType, const SDLoc &DL, EVT VT,
1420+
SDValue Chain, SDValue Ptr, SDValue Stride, SDValue Mask,
1421+
SDValue EVL, MachinePointerInfo PtrInfo, EVT MemVT,
1422+
MaybeAlign Alignment, MachineMemOperand::Flags MMOFlags,
1423+
const AAMDNodes &AAInfo, bool IsExpanding = false);
1424+
SDValue getExtStridedLoadVP(ISD::LoadExtType ExtType, const SDLoc &DL, EVT VT,
1425+
SDValue Chain, SDValue Ptr, SDValue Stride,
1426+
SDValue Mask, SDValue EVL, EVT MemVT,
1427+
MachineMemOperand *MMO, bool IsExpanding = false);
1428+
SDValue getIndexedStridedLoadVP(SDValue OrigLoad, const SDLoc &DL,
1429+
SDValue Base, SDValue Offset,
1430+
ISD::MemIndexedMode AM);
1431+
SDValue getStridedStoreVP(SDValue Chain, const SDLoc &DL, SDValue Val,
1432+
SDValue Ptr, SDValue Offset, SDValue Stride,
1433+
SDValue Mask, SDValue EVL, EVT MemVT,
1434+
MachineMemOperand *MMO, ISD::MemIndexedMode AM,
1435+
bool IsTruncating = false,
1436+
bool IsCompressing = false);
1437+
SDValue getTruncStridedStoreVP(SDValue Chain, const SDLoc &DL, SDValue Val,
1438+
SDValue Ptr, SDValue Stride, SDValue Mask,
1439+
SDValue EVL, MachinePointerInfo PtrInfo,
1440+
EVT SVT, Align Alignment,
1441+
MachineMemOperand::Flags MMOFlags,
1442+
const AAMDNodes &AAInfo,
1443+
bool IsCompressing = false);
1444+
SDValue getTruncStridedStoreVP(SDValue Chain, const SDLoc &DL, SDValue Val,
1445+
SDValue Ptr, SDValue Stride, SDValue Mask,
1446+
SDValue EVL, EVT SVT, MachineMemOperand *MMO,
1447+
bool IsCompressing = false);
1448+
SDValue getIndexedStridedStoreVP(SDValue OrigStore, const SDLoc &DL,
1449+
SDValue Base, SDValue Offset,
1450+
ISD::MemIndexedMode AM);
1451+
13811452
SDValue getGatherVP(SDVTList VTs, EVT VT, const SDLoc &dl,
13821453
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
13831454
ISD::MemIndexType IndexType);

0 commit comments

Comments
 (0)