Skip to content

Commit d3824c2

Browse files
authored
Intel(R) SHMEM Library (ISHMEM) 1.3.0
Signed-off-by: sys_shmem <sys_shmem@intel.com>
1 parent ac7ffe0 commit d3824c2

File tree

116 files changed

+355
-138
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

116 files changed

+355
-138
lines changed

.clang-format

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,4 @@ IndentWidth: 4
2222
KeepEmptyLinesAtTheStartOfBlocks: false
2323
SortIncludes: false
2424
SpaceAfterCStyleCast: true
25+
NamespaceIndentation: All

.clang-format-ignore

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Skip cmake files
2+
**/CMakeLists.txt
3+
**/cmake/*
4+
**/CMakeFiles/*
5+
6+
# Skip non-source files/directories
7+
*.md
8+
LICENSE
9+
tags
10+
third-party-programs.txt
11+
.github/*
12+
.git/*
13+
docs/*
14+
maint/*
15+
scripts/*
16+
pkgconfig/*
17+
18+
# Skip third-party files
19+
src/malloc.cpp
20+
src/uthash.h
21+
pmi-simple/*

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
.*.sw*
22
build
33
tags
4+
ishmem/config.h

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ set(PROJECT_NAME "ishmem")
1212
set(PROJECT_FULL_NAME "Intel® SHMEM")
1313

1414
set(ISHMEM_MAJOR_VERSION "1")
15-
set(ISHMEM_MINOR_VERSION "2")
15+
set(ISHMEM_MINOR_VERSION "3")
1616
set(ISHMEM_PATCH_VERSION "0")
1717
set(PROJECT_VERSION "${ISHMEM_MAJOR_VERSION}.${ISHMEM_MINOR_VERSION}.${ISHMEM_PATCH_VERSION}")
1818

RELEASE_NOTES.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
11
# Release Notes <!-- omit in toc -->
22
This document contains a list of new features and known limitations of Intel® SHMEM releases.
33

4+
## Release 1.3.0
5+
6+
### New Features and Enhancements
7+
- Support for C++ templated APIs for non-Debug build types.
8+
- Bug fixes improving functionality.
9+
10+
### Known Limitations
11+
- Only [Sandia OpenSHMEM](https://github.com/Sandia-OpenSHMEM/SOS) and [Intel® MPI Library](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html) are currently supported as the host back-end.
12+
- Not all APIs from OpenSHMEM standard are supported. Please refer to [Supported/Unsupported Features](https://oneapi-src.github.io/ishmem/supported_features.html) to get a complete view.
13+
- Intel® SHMEM requires a one-to-one mapping of PEs to SYCL devices. This implies that Intel® SHMEM executions must launch with a number of processes on each compute node that is no more than the number of available SYCL devices on each one of those nodes. By default, the Intel® SHMEM runtime considers each individual device tile to make up a single SYCL device and assigns a tile per PE.
14+
- All collective operations within a kernel must complete before invoking subsequent kernel-initiated collective operation.
15+
- To run Intel® SHMEM with SOS enabling the Slingshot provider in OFI, environment variable `FI_CXI_OPTIMIZED_MRS=0` must be used. It is also recommended to use `FI_CXI_DEFAULT_CQ_SIZE=131072`.
16+
- To run Intel® SHMEM with SOS enabling the verbs provider, environment variable `MLX5_SCATTER_TO_CQE=0` must be used.
17+
- To run Intel® SHMEM with Intel® MPI Library, environment variable `I_MPI_OFFLOAD=1` must be used. Additionally, `I_MPI_OFFLOAD_RDMA=1` may be necessary for GPU RDMA depending on the OFI provider. Please refer to the [reference guide](https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-14/gpu-buffers-support.html) for further details.
18+
- Inter-node communication in Intel® SHMEM requires [dma-buf](https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html) support in the Linux kernel. Inter-node functionality in Intel® SHMEM Release 1.3.0 is tested with SUSE Linux Enterprise Server 15 SP4.
19+
420
## Release 1.2.0
521

622
### New Features and Enhancements

docs/source/compiling_and_running_programs.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,29 @@ Intel® SHMEM build configuration::
4444

4545
See section :ref:`Library Constants<library_constants>` for more information
4646
about these variables.
47+
48+
Selecting SPIR-V Compilation Targets
49+
------------------------------------
50+
51+
On some systems, you may encounter an error in which the correct SPIR-V targets
52+
are not successfully selected when linking with Intel® SHMEM. This may result in
53+
problems when using device-initiated communication including compilation
54+
warnings: ::
55+
56+
icpx: warning: linked binaries do not contain expected 'spir64-unknown-unknown' target; found targets: 'spir64_gen-unknown-unknown' [-Wsycl-target]
57+
58+
as well as runtime errors: ::
59+
60+
terminate called after throwing an instance of 'sycl::_V1::compile_program_error'
61+
what(): The program was built for 1 devices
62+
Build program log for 'Intel(R) Data Center GPU Max 1550':
63+
Module <0x29941d0>: Unresolved Symbol <_Z13ishmem_putmemPvPKvmi>
64+
Module <0x29941d0>: Unresolved Symbol <_Z13ishmem_putmemPvPKvmi>
65+
Module <0x29941d0>: Unresolved Symbol <_Z13ishmem_putmemPvPKvmi>
66+
Module <0x29941d0>: Unresolved Symbol <_Z13ishmem_putmemPvPKvmi> -11 (PI_ERROR_BUILD_PROGRAM_FAILURE)
67+
68+
This error can be resolved by indicating the desired target at compile time. To
69+
compile with the appropriate target for a Intel® Data Center GPU Max 1550 (PVC)
70+
GPU, add the following flags when linking: ::
71+
72+
-fsycl-targets=spir64_gen --start-no-unused-arguments -Xs "-device pvc" --end-no-unused-arguments --start-no-unused-arguments -Xsycl-target-backend "-q" --end-no-unused-arguments

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
project = u'Intel® SHMEM'
1717
copyright = u'2024 Intel Corporation licensed under Creative Commons BY 4.0'
1818
author = u'Intel Corporation'
19-
release = u'1.2.0'
19+
release = u'1.3.0'
2020
version = release
2121

2222
# -- General configuration ---------------------------------------------------

docs/source/library_setup_exit_query.rst

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,24 +50,27 @@ Callable from the **host**.
5050
Indicates which runtime is used to initialize Intel® SHMEM: either
5151
OpenSHMEM, MPI, or PMI.
5252

53-
.. important:: As of Intel® SHMEM |version| only ISHMEM_RUNTIME_OPENSHMEM is supported.
53+
.. important:: As of Intel® SHMEM |version| only ISHMEM_RUNTIME_OPENSHMEM and ISHMEM_RUNTIME_MPI are supported.
5454

5555
.. _ishmemx_attr_t:
5656
.. cpp:struct:: ishmemx_attr_t
5757

5858
.. c:var:: ishmemx_runtime_type_t runtime
5959
.. c:var:: bool initialize_runtime = true
6060
.. c:var:: bool gpu = true
61+
.. c:var:: void *mpi_comm
6162
6263
**Description:**
6364
A struct declaration describing attributes for initialization.
6465
A valid **runtime** enumeration value must be set by the user and must
6566
correspond to a runtime that is enabled within the build of the ``ishmem``
6667
library.
67-
By default, the parallel runtime is assumed to be initialized by the user
68-
(**initialize_runtime** default is ``false``).
68+
By default, the parallel runtime is initialized by Intel® SHMEM
69+
(**initialize_runtime** default is ``true``).
6970
The **gpu** boolean indicates whether to use GPU memory for the symmetric
70-
heap (default is ``false``).
71+
heap (default is ``true``). **mpi_comm** is a pointer to the corresponding
72+
MPI communicator for representing ``ISHMEM_TEAM_WORLD`` when used with
73+
``ISHMEM_RUNTIME_MPI`` (default is ``MPI_COMM_WORLD``).
7174

7275
.. _ishmemx_init_attr:
7376
.. cpp:function:: void ishmemx_init_attr(ishmemx_attr_t * attr)

docs/source/writing_programs.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,8 @@
44
Writing Intel® SHMEM Programs
55
========================================
66

7-
Intel® SHMEM Programs require including the following header files::
7+
Intel® SHMEM Programs require including the following header file::
88

9-
#include <CL/sycl.hpp>
109
#include <ishmem.h>
1110

1211
Here is how to :ref:`initialize<library_setup_exit_query_routines>` the

examples/1_helloworld.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
* SPDX-License-Identifier: BSD-3-Clause
33
*/
44

5-
#include <CL/sycl.hpp>
65
#include <cstdlib>
76
#include <iostream>
87
#include <ishmem.h>

examples/2_get.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
* SPDX-License-Identifier: BSD-3-Clause
33
*/
44

5-
#include <CL/sycl.hpp>
65
#include <cstdlib>
76
#include <iostream>
87
#include <ishmem.h>

examples/3_library_apis.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
* SPDX-License-Identifier: BSD-3-Clause
33
*/
44

5-
#include <CL/sycl.hpp>
65
#include <ishmem.h>
76
#include <ishmemx.h>
7+
#include <cmath>
88

99
constexpr int array_size = 10; /* num_threads = array_size / chunk_size */
1010
constexpr int chunk_size = 2; /* data partition/chunk size per thread */

examples/4_pi.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
* SPDX-License-Identifier: BSD-3-Clause
33
*/
44

5-
#include <CL/sycl.hpp>
5+
#include <cmath>
66
#include <cstdlib>
77
#include <iomanip>
88
#include <iostream>

examples/5_pi_reduce.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
* SPDX-License-Identifier: BSD-3-Clause
33
*/
44

5-
#include <CL/sycl.hpp>
5+
#include <cmath>
66
#include <cstdio>
77
#include <cstdlib>
88
#include <iomanip>

src/amo.cpp

Lines changed: 69 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,49 +91,91 @@ void ishmem_atomic_xor(T *dest, T val, int pe)
9191

9292
/* clang-format off */
9393
#define ISHMEMI_API_IMPL_ATOMIC_FETCH(TYPENAME, TYPE) \
94+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
9495
TYPE ishmem_##TYPENAME##_atomic_fetch(TYPE *src, int pe) { return ishmem_atomic_fetch<TYPE>(src, pe); }
9596
#define ISHMEMI_API_IMPL_ATOMIC_COMPARE_SWAP(TYPENAME, TYPE) \
97+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
9698
TYPE ishmem_##TYPENAME##_atomic_compare_swap(TYPE *dest, TYPE cond, TYPE val, int pe) { return ishmem_atomic_compare_swap<TYPE>(dest, cond, val, pe); }
9799
#define ISHMEMI_API_IMPL_ATOMIC_SWAP(TYPENAME, TYPE) \
100+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
98101
TYPE ishmem_##TYPENAME##_atomic_swap(TYPE *dest, TYPE val, int pe) { return ishmem_atomic_swap<TYPE>(dest, val, pe); }
99102
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_INC(TYPENAME, TYPE) \
103+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
100104
TYPE ishmem_##TYPENAME##_atomic_fetch_inc(TYPE *dest, int pe) { return ishmem_atomic_fetch_inc<TYPE>(dest, pe); }
101105
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_ADD(TYPENAME, TYPE) \
106+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
102107
TYPE ishmem_##TYPENAME##_atomic_fetch_add(TYPE *dest, TYPE val, int pe) { return ishmem_atomic_fetch_add<TYPE>(dest, val, pe); }
103108
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_AND(TYPENAME, TYPE) \
109+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
104110
TYPE ishmem_##TYPENAME##_atomic_fetch_and(TYPE *dest, TYPE val, int pe) { return ishmem_atomic_fetch_and<TYPE>(dest, val, pe); }
105111
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_OR(TYPENAME, TYPE) \
112+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
106113
TYPE ishmem_##TYPENAME##_atomic_fetch_or(TYPE *dest, TYPE val, int pe) { return ishmem_atomic_fetch_or<TYPE>(dest, val, pe); }
107114
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_XOR(TYPENAME, TYPE) \
115+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
108116
TYPE ishmem_##TYPENAME##_atomic_fetch_xor(TYPE *dest, TYPE val, int pe) { return ishmem_atomic_fetch_xor<TYPE>(dest, val, pe); }
109117
#define ISHMEMI_API_IMPL_ATOMIC_SET(TYPENAME, TYPE) \
118+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
110119
void ishmem_##TYPENAME##_atomic_set(TYPE *dest, TYPE val, int pe) { ishmem_atomic_set<TYPE>(dest, val, pe); }
111120
#define ISHMEMI_API_IMPL_ATOMIC_INC(TYPENAME, TYPE) \
121+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
112122
void ishmem_##TYPENAME##_atomic_inc(TYPE *dest, int pe) { ishmem_atomic_inc<TYPE>(dest, pe); }
113123
#define ISHMEMI_API_IMPL_ATOMIC_ADD(TYPENAME, TYPE) \
124+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
114125
void ishmem_##TYPENAME##_atomic_add(TYPE *dest, TYPE val, int pe) { ishmem_atomic_add<TYPE>(dest, val, pe); }
115126
#define ISHMEMI_API_IMPL_ATOMIC_AND(TYPENAME, TYPE) \
127+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
116128
void ishmem_##TYPENAME##_atomic_and(TYPE *dest, TYPE val, int pe) { ishmem_atomic_and<TYPE>(dest, val, pe); }
117129
#define ISHMEMI_API_IMPL_ATOMIC_OR(TYPENAME, TYPE) \
130+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
118131
void ishmem_##TYPENAME##_atomic_or(TYPE *dest, TYPE val, int pe) { ishmem_atomic_or<TYPE>(dest, val, pe); }
119132
#define ISHMEMI_API_IMPL_ATOMIC_XOR(TYPENAME, TYPE) \
133+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
120134
void ishmem_##TYPENAME##_atomic_xor(TYPE *dest, TYPE val, int pe) { ishmem_atomic_xor<TYPE>(dest, val, pe); }
121-
/* clang-format on */
122135

136+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template TYPE ishmem_atomic_fetch(TYPE *, int)
123137
ISHMEMI_API_GENERATE_AMO_EXT_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH)
138+
#undef ISHMEM_INSTANTIATE_TYPE
139+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template TYPE ishmem_atomic_compare_swap(TYPE *, TYPE, TYPE, int)
124140
ISHMEMI_API_GENERATE_AMO_STD_TYPES(ISHMEMI_API_IMPL_ATOMIC_COMPARE_SWAP)
141+
#undef ISHMEM_INSTANTIATE_TYPE
142+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template TYPE ishmem_atomic_swap(TYPE *, TYPE, int)
125143
ISHMEMI_API_GENERATE_AMO_EXT_TYPES(ISHMEMI_API_IMPL_ATOMIC_SWAP)
144+
#undef ISHMEM_INSTANTIATE_TYPE
145+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template TYPE ishmem_atomic_fetch_inc(TYPE *, int)
126146
ISHMEMI_API_GENERATE_AMO_STD_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_INC)
147+
#undef ISHMEM_INSTANTIATE_TYPE
148+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template TYPE ishmem_atomic_fetch_add(TYPE *, TYPE, int)
127149
ISHMEMI_API_GENERATE_AMO_STD_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_ADD)
150+
#undef ISHMEM_INSTANTIATE_TYPE
151+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template TYPE ishmem_atomic_fetch_and(TYPE *, TYPE, int)
128152
ISHMEMI_API_GENERATE_AMO_BIT_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_AND)
153+
#undef ISHMEM_INSTANTIATE_TYPE
154+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template TYPE ishmem_atomic_fetch_or(TYPE *, TYPE, int)
129155
ISHMEMI_API_GENERATE_AMO_BIT_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_OR)
156+
#undef ISHMEM_INSTANTIATE_TYPE
157+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template TYPE ishmem_atomic_fetch_xor(TYPE *, TYPE, int)
130158
ISHMEMI_API_GENERATE_AMO_BIT_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_XOR)
159+
#undef ISHMEM_INSTANTIATE_TYPE
160+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_set(TYPE *, TYPE, int)
131161
ISHMEMI_API_GENERATE_AMO_EXT_TYPES(ISHMEMI_API_IMPL_ATOMIC_SET)
162+
#undef ISHMEM_INSTANTIATE_TYPE
163+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_inc(TYPE *, int)
132164
ISHMEMI_API_GENERATE_AMO_STD_TYPES(ISHMEMI_API_IMPL_ATOMIC_INC)
165+
#undef ISHMEM_INSTANTIATE_TYPE
166+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_add(TYPE *, TYPE, int)
133167
ISHMEMI_API_GENERATE_AMO_STD_TYPES(ISHMEMI_API_IMPL_ATOMIC_ADD)
168+
#undef ISHMEM_INSTANTIATE_TYPE
169+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_and(TYPE *, TYPE, int)
134170
ISHMEMI_API_GENERATE_AMO_BIT_TYPES(ISHMEMI_API_IMPL_ATOMIC_AND)
171+
#undef ISHMEM_INSTANTIATE_TYPE
172+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_or(TYPE *, TYPE, int)
135173
ISHMEMI_API_GENERATE_AMO_BIT_TYPES(ISHMEMI_API_IMPL_ATOMIC_OR)
174+
#undef ISHMEM_INSTANTIATE_TYPE
175+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_xor(TYPE *, TYPE, int)
136176
ISHMEMI_API_GENERATE_AMO_BIT_TYPES(ISHMEMI_API_IMPL_ATOMIC_XOR)
177+
#undef ISHMEM_INSTANTIATE_TYPE
178+
/* clang-format on */
137179

138180
/* Non-Blocking AMOs */
139181
template <typename T>
@@ -186,28 +228,53 @@ void ishmem_atomic_fetch_xor_nbi(T *fetch, T *dest, T val, int pe)
186228

187229
/* clang-format off */
188230
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_NBI(TYPENAME, TYPE) \
231+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
189232
void ishmem_##TYPENAME##_atomic_fetch_nbi(TYPE *fetch, TYPE *src, int pe) { ishmem_atomic_fetch_nbi<TYPE>(fetch, src, pe); }
190233
#define ISHMEMI_API_IMPL_ATOMIC_COMPARE_SWAP_NBI(TYPENAME, TYPE) \
234+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
191235
void ishmem_##TYPENAME##_atomic_compare_swap_nbi(TYPE *fetch, TYPE *dest, TYPE cond, TYPE val, int pe) { ishmem_atomic_compare_swap_nbi<TYPE>(fetch, dest, cond, val, pe); }
192236
#define ISHMEMI_API_IMPL_ATOMIC_SWAP_NBI(TYPENAME, TYPE) \
237+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
193238
void ishmem_##TYPENAME##_atomic_swap_nbi(TYPE *fetch, TYPE *dest, TYPE val, int pe) { ishmem_atomic_swap_nbi<TYPE>(fetch, dest, val, pe); }
194239
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_INC_NBI(TYPENAME, TYPE) \
240+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
195241
void ishmem_##TYPENAME##_atomic_fetch_inc_nbi(TYPE *fetch, TYPE *dest, int pe) { ishmem_atomic_fetch_inc_nbi<TYPE>(fetch, dest, pe); }
196242
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_ADD_NBI(TYPENAME, TYPE) \
243+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
197244
void ishmem_##TYPENAME##_atomic_fetch_add_nbi(TYPE *fetch, TYPE *dest, TYPE val, int pe) { ishmem_atomic_fetch_add_nbi<TYPE>(fetch, dest, val, pe); }
198245
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_AND_NBI(TYPENAME, TYPE) \
246+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
199247
void ishmem_##TYPENAME##_atomic_fetch_and_nbi(TYPE *fetch, TYPE *dest, TYPE val, int pe) { ishmem_atomic_fetch_and_nbi<TYPE>(fetch, dest, val, pe); }
200248
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_OR_NBI(TYPENAME, TYPE) \
249+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
201250
void ishmem_##TYPENAME##_atomic_fetch_or_nbi(TYPE *fetch, TYPE *dest, TYPE val, int pe) { ishmem_atomic_fetch_or_nbi<TYPE>(fetch, dest, val, pe); }
202251
#define ISHMEMI_API_IMPL_ATOMIC_FETCH_XOR_NBI(TYPENAME, TYPE) \
252+
ISHMEM_INSTANTIATE_TYPE_##TYPENAME(TYPE); \
203253
void ishmem_##TYPENAME##_atomic_fetch_xor_nbi(TYPE *fetch, TYPE *dest, TYPE val, int pe) { ishmem_atomic_fetch_xor_nbi<TYPE>(fetch, dest, val, pe); }
204-
/* clang-format on */
205254

255+
#undef ISHMEM_INSTANTIATE_TYPE
256+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_fetch_nbi(TYPE *, TYPE *, int)
206257
ISHMEMI_API_GENERATE_AMO_EXT_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_NBI)
258+
#undef ISHMEM_INSTANTIATE_TYPE
259+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_compare_swap_nbi(TYPE *, TYPE *, TYPE, TYPE, int)
207260
ISHMEMI_API_GENERATE_AMO_STD_TYPES(ISHMEMI_API_IMPL_ATOMIC_COMPARE_SWAP_NBI)
261+
#undef ISHMEM_INSTANTIATE_TYPE
262+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_swap_nbi(TYPE *, TYPE *, TYPE, int)
208263
ISHMEMI_API_GENERATE_AMO_EXT_TYPES(ISHMEMI_API_IMPL_ATOMIC_SWAP_NBI)
264+
#undef ISHMEM_INSTANTIATE_TYPE
265+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_fetch_inc_nbi(TYPE *, TYPE *, int)
209266
ISHMEMI_API_GENERATE_AMO_STD_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_INC_NBI)
267+
#undef ISHMEM_INSTANTIATE_TYPE
268+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_fetch_add_nbi(TYPE *, TYPE *, TYPE, int)
210269
ISHMEMI_API_GENERATE_AMO_STD_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_ADD_NBI)
270+
#undef ISHMEM_INSTANTIATE_TYPE
271+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_fetch_and_nbi(TYPE *, TYPE *, TYPE, int)
211272
ISHMEMI_API_GENERATE_AMO_BIT_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_AND_NBI)
273+
#undef ISHMEM_INSTANTIATE_TYPE
274+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_fetch_or_nbi(TYPE *, TYPE *, TYPE, int)
212275
ISHMEMI_API_GENERATE_AMO_BIT_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_OR_NBI)
276+
#undef ISHMEM_INSTANTIATE_TYPE
277+
#define ISHMEM_INSTANTIATE_TYPE(TYPE) template void ishmem_atomic_fetch_xor_nbi(TYPE *, TYPE *, TYPE, int)
213278
ISHMEMI_API_GENERATE_AMO_BIT_TYPES(ISHMEMI_API_IMPL_ATOMIC_FETCH_XOR_NBI)
279+
#undef ISHMEM_INSTANTIATE_TYPE
280+
/* clang-format on */

0 commit comments

Comments
 (0)