@@ -1451,6 +1451,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
1451
1451
It is preferred over llvm.amdgcn.mov.dpp.`<type>` for future use.
1452
1452
`llvm.amdgcn.update.dpp.<type> <old> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>`
1453
1453
Should be equivalent to:
1454
+
1454
1455
- `v_mov_b32 <dest> <old>`
1455
1456
- `v_mov_b32 <dest> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>`
1456
1457
@@ -6032,7 +6033,7 @@ GFX6-GFX8
6032
6033
available in dispatch packet. For M0, it is also possible to use maximum
6033
6034
possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for
6034
6035
GFX7-GFX8).
6035
- GFX9-GFX11
6036
+ GFX9 and later
6036
6037
The M0 register is not used for range checking LDS accesses and so does not
6037
6038
need to be initialized in the prolog.
6038
6039
@@ -16639,25 +16640,25 @@ scratch address space.
16639
16640
16640
16641
On entry to a function:
16641
16642
16642
- 1 . SGPR0-3 contain a V# with the following properties (see
16643
+ # . SGPR0-3 contain a V# with the following properties (see
16643
16644
:ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`):
16644
16645
16645
16646
* Base address pointing to the beginning of the wavefront scratch backing
16646
16647
memory.
16647
16648
* Swizzled with dword element size and stride of wavefront size elements.
16648
16649
16649
- 2 . The FLAT_SCRATCH register pair is setup. See
16650
+ # . The FLAT_SCRATCH register pair is setup. See
16650
16651
:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
16651
- 3 . GFX6-GFX8: M0 register set to the size of LDS in bytes. See
16652
+ # . GFX6-GFX8: M0 register set to the size of LDS in bytes. See
16652
16653
:ref:`amdgpu-amdhsa-kernel-prolog-m0`.
16653
- 4 . The EXEC register is set to the lanes active on entry to the function.
16654
- 5 . MODE register: *TBD*
16655
- 6 . VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
16654
+ # . The EXEC register is set to the lanes active on entry to the function.
16655
+ # . MODE register: *TBD*
16656
+ # . VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
16656
16657
below.
16657
- 7 . SGPR30-31 return address (RA). The code address that the function must
16658
+ # . SGPR30-31 return address (RA). The code address that the function must
16658
16659
return to when it completes. The value is undefined if the function is *no
16659
16660
return*.
16660
- 8 . SGPR32 is used for the stack pointer (SP). It is an unswizzled scratch
16661
+ # . SGPR32 is used for the stack pointer (SP). It is an unswizzled scratch
16661
16662
offset relative to the beginning of the wavefront scratch backing memory.
16662
16663
16663
16664
The unswizzled SP can be used with buffer instructions as an unswizzled SGPR
@@ -16694,19 +16695,19 @@ On entry to a function:
16694
16695
arguments after the last local allocation and adjust SGPR32 to the address
16695
16696
after the last local allocation.
16696
16697
16697
- 9. All other registers are unspecified.
16698
- 10 . Any necessary ``s_waitcnt`` has been performed to ensure memory is available
16699
- to the function.
16700
- 11 . Use pass-by-reference (byref) in stead of pass-by-value (byval) for struct
16701
- arguments in C ABI. Callee is responsible for allocating stack memory and
16702
- copying the value of the struct if modified. Note that the backend still
16703
- supports byval for struct arguments.
16698
+ #. All other registers are unspecified.
16699
+ # . Any necessary ``s_waitcnt`` has been performed to ensure memory is available
16700
+ to the function.
16701
+ # . Use pass-by-reference (byref) in stead of pass-by-value (byval) for struct
16702
+ arguments in C ABI. Callee is responsible for allocating stack memory and
16703
+ copying the value of the struct if modified. Note that the backend still
16704
+ supports byval for struct arguments.
16704
16705
16705
16706
On exit from a function:
16706
16707
16707
- 1 . VGPR0-31 and SGPR4-29 are used to pass function result arguments as
16708
+ # . VGPR0-31 and SGPR4-29 are used to pass function result arguments as
16708
16709
described below. Any registers used are considered clobbered registers.
16709
- 2 . The following registers are preserved and have the same value as on entry:
16710
+ # . The following registers are preserved and have the same value as on entry:
16710
16711
16711
16712
* FLAT_SCRATCH
16712
16713
* EXEC
@@ -16741,10 +16742,10 @@ On exit from a function:
16741
16742
preserved if it can be determined that the called function does not change
16742
16743
their value.
16743
16744
16744
- 2 . The PC is set to the RA provided on entry.
16745
- 3 . MODE register: *TBD*.
16746
- 4 . All other registers are clobbered.
16747
- 5 . Any necessary ``s_waitcnt`` has been performed to ensure memory accessed by
16745
+ # . The PC is set to the RA provided on entry.
16746
+ # . MODE register: *TBD*.
16747
+ # . All other registers are clobbered.
16748
+ # . Any necessary ``s_waitcnt`` has been performed to ensure memory accessed by
16748
16749
function is available to the caller.
16749
16750
16750
16751
.. TODO::
0 commit comments