@@ -80,15 +80,78 @@ running Open MPI's ``configure`` script.
80
80
81
81
.. _label-install-packagers-dso-or-not :
82
82
83
- Components ("plugins"): DSO or no ?
84
- ----------------------------------
83
+ Components ("plugins"): static or DSO ?
84
+ --------------------------------------
85
85
86
86
Open MPI contains a large number of components (sometimes called
87
87
"plugins") to effect different types of functionality in MPI. For
88
88
example, some components effect Open MPI's networking functionality:
89
89
they may link against specialized libraries to provide
90
90
highly-optimized network access.
91
91
92
+ Open MPI can build its components as Dynamic Shared Objects (DSOs) or
93
+ statically included in core libraries (regardless of whether those
94
+ libraries are built as shared or static libraries).
95
+
96
+ .. note :: As of Open MPI |ompi_ver|, ``configure``'s global default is
97
+ to build all components as static (i.e., part of the Open
98
+ MPI core libraries, not as DSOs). Prior to Open MPI v5.0.0,
99
+ the global default behavior was to build most components as
100
+ DSOs.
101
+
102
+ Why build components as DSOs?
103
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
104
+
105
+ There are advantages to building components as DSOs:
106
+
107
+ * Open MPI's core libraries |mdash | and therefore MPI applications
108
+ |mdash | will have very few dependencies. For example, if you build
109
+ Open MPI with support for a specific network stack, the libraries in
110
+ that network stack will be dependencies of the DSOs, not Open MPI's
111
+ core libraries (or MPI applications).
112
+
113
+ * Removing Open MPI functionality that you do not want is as simple as
114
+ removing a DSO from ``$libdir/open-mpi ``.
115
+
116
+ Why build components as part of Open MPI's core libraries?
117
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
118
+
119
+ The biggest advantage to building the components as part of Open MPI's
120
+ core libraries is when running at (very) large scales when Open MPI is
121
+ installed on a network filesystem (vs. being installed on a local
122
+ filesystem).
123
+
124
+ For example, consider launching a single MPI process on each of 1,000
125
+ nodes. In this scenario, the following is accessed from the network
126
+ filesystem:
127
+
128
+ #. The MPI application
129
+ #. The core Open MPI libraries and their dependencies (e.g.,
130
+ ``libmpi ``)
131
+
132
+ * Depending on your configuration, this is probably on the order of
133
+ 10-20 library files.
134
+
135
+ #. All DSO component files and their dependencies
136
+
137
+ * Depending on your configuration, this can be 200+ component
138
+ files.
139
+
140
+ If all components are physically located in the libraries, then the
141
+ third step loads zero DSO component files. When using a networked
142
+ filesystem while launching at scale, this can translate to large
143
+ performance savings.
144
+
145
+ .. note :: If not using a networked filesystem, or if not launching at
146
+ scale, loading a large number of DSO files may not consume a
147
+ noticeable amount of time during MPI process launch. Put
148
+ simply: loading DSOs as indvidual files generally only
149
+ matters when using a networked filesystem while launching at
150
+ scale.
151
+
152
+ Direct controls for building components as DSOs or not
153
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
154
+
92
155
Open MPI |ompi_ver | has two ``configure ``-time defaults regarding the
93
156
treatment of components that may be of interest to packagers:
94
157
@@ -135,19 +198,121 @@ using ``--enable-mca-dso`` to selectively build some components as
135
198
DSOs and leave the others included in their respective Open MPI
136
199
libraries.
137
200
201
+ :ref: `See the section on building accelerator support
202
+ <label-install-packagers-building-accelerator-support-as-dsos>` for a
203
+ practical example where this can be useful.
204
+
205
+ .. _label-install-packagers-gnu-libtool-dependency-flattening :
206
+
207
+ GNU Libtool dependency flattening
208
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
209
+
210
+ When compiling Open MPI's components statically as part of Open MPI's
211
+ core libraries, `GNU Libtool <https://www.gnu.org/software/libtool/ >`_
212
+ |mdash | which is used as part of Open MPI's build system |mdash | will
213
+ attempt to "flatten" dependencies.
214
+
215
+ For example, the :ref: `ompi_info(1) <man1-ompi_info >` command links
216
+ against the Open MPI core library ``libopen-pal ``. This library will
217
+ have dependencies on various HPC-class network stack libraries. For
218
+ simplicity, the discussion below assumes that Open MPI was built with
219
+ support for `Libfabric <https://libfabric.org/ >`_ and `UCX
220
+ <https://openucx.org/> `_, and therefore ``libopen-pal `` has direct
221
+ dependencies on ``libfabric `` and ``libucx ``.
222
+
223
+ In this scenario, GNU Libtool will automatically attempt to "flatten"
224
+ these dependencies by linking :ref: `ompi_info(1) <man1-ompi_info >`
225
+ directly to ``libfabric `` and ``libucx `` (vs. letting ``libopen-pal ``
226
+ pull the dependencies in at run time).
227
+
228
+ * In some environments (e.g., Ubuntu 22.04), the compiler and/or
229
+ linker will automatically utilize the linker CLI flag
230
+ ``-Wl,--as-needed ``, which will effectively cause these dependencies
231
+ to *not * be flattened: :ref: `ompi_info(1) <man1-ompi_info >` will
232
+ *not * have a direct dependencies on either ``libfabric `` or
233
+ ``libucx ``.
234
+
235
+ * In other environments (e.g., Fedora 38), the compiler and linker
236
+ will *not * utilize the ``-Wl,--as-needed `` linker CLI flag. As
237
+ such, :ref: `ompi_info(1) <man1-ompi_info >` will show direct
238
+ dependencies on ``libfabric `` and ``libucx ``.
239
+
240
+ **Just to be clear: ** these flattened dependencies *are not a
241
+ problem *. Open MPI will function correctly with or without the
242
+ flattened dependencies. There is no performance impact associated
243
+ with having |mdash | or not having |mdash | the flattened dependencies.
244
+ We mention this situation here in the documentation simply because it
245
+ surprised some Open MPI downstream package managers to see that
246
+ :ref: `ompi_info(1) <man1-ompi_info >` in Open MPI |ompi_ver | had more
247
+ shared library dependencies than it did in prior Open MPI releases.
248
+
249
+ If packagers want :ref: `ompi_info(1) <man1-ompi_info >` to not have
250
+ these flattened dependencies, use either of the following mechanisms:
251
+
252
+ #. Use ``--enable-mca-dso `` to force all components to be built as
253
+ DSOs (this was actually the default behavior before Open MPI v5.0.0).
254
+
255
+ #. Add ``LDFLAGS=-Wl,--as-needed `` to the ``configure `` command line
256
+ when building Open MPI.
257
+
258
+ .. note :: The Open MPI community specifically chose not to
259
+ automatically utilize this linker flag for the following
260
+ reasons:
261
+
262
+ #. Having the flattened dependencies does not cause any
263
+ correctness or performance problems.
264
+ #. There's multiple mechanisms (see above) for users or
265
+ packagers to change this behavior, if desired.
266
+ #. Certain environments have chosen to have |mdash | or
267
+ not have |mdash | this flattened dependency behavior.
268
+ It is not Open MPI's place to override these choices.
269
+ #. In general, Open MPI's ``configure `` script only
270
+ utilizes compiler and linker flags if they are
271
+ *needed *. All other flags should be the user's /
272
+ packager's choice.
273
+
274
+ .. _label-install-packagers-building-accelerator-support-as-dsos :
275
+
276
+ Building accelerator support as DSOs
277
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
278
+
279
+ If you are building a package that includes support for one or more
280
+ accelerators, it may be desirable to build accelerator-related
281
+ components as DSOs (see the :ref: `static or DSO?
282
+ <label-install-packagers-dso-or-not>` section for details).
283
+
284
+ .. admonition :: Rationale
285
+ :class: tip
286
+
287
+ Accelerator hardware is expensive, and may only be present on some
288
+ compute nodes in an HPC cluster. Specifically: there may not be
289
+ any accelerator hardware on "head" or compile nodes in an HPC
290
+ cluster. As such, invoking Open MPI commands on a "head" node with
291
+ an MPI that was built with static accelerator support but no
292
+ accelerator hardware may fail to launch because of run-time linker
293
+ issues (because the accelerator hardware support libraries are
294
+ likely not present).
295
+
296
+ Building Open MPI's accelerator-related components as DSOs allows
297
+ Open MPI to *try * opening the accelerator components, but proceed
298
+ if those DSOs fail to open due to the lack of support libraries.
299
+
300
+ Use the ``--enable-mca-dso `` command line parameter to Open MPI's
301
+ ``configure `` command can allow packagers to build all
302
+ accelerator-related components as DSO. For example:
303
+
138
304
.. code :: sh
139
305
140
- # Build all the " accelerator" components as DSOs (all other
306
+ # Build all the accelerator-related components as DSOs (all other
141
307
# components will default to being built in their respective
142
308
# libraries)
143
- shell$ ./configure --enable-mca-dso=accelerator ...
144
-
145
- This allows packaging ``$libdir `` as part of the "main" Open MPI
146
- binary package, but then packaging
147
- ``$libdir/openmpi/mca_accelerator_*.so `` as sub-packages. These
148
- sub-packages may inherit dependencies on the CUDA and/or ROCM
149
- packages, for example. User can always install the "main" Open MPI
150
- binary package, and can install the additional "accelerator" Open MPI
151
- binary sub-package if they actually have accelerator hardware
152
- installed (which will cause the installation of additional
153
- dependencies).
309
+ shell$ ./configure --enable-mca-dso=btl-smcuda,rcache-rgpusm,rcache-gpusm,accelerator
310
+
311
+ Per the example above, this allows packaging ``$libdir `` as part of
312
+ the "main" Open MPI binary package, but then packaging
313
+ ``$libdir/openmpi/mca_accelerator_*.so `` and the other named
314
+ components as sub-packages. These sub-packages may inherit
315
+ dependencies on the CUDA and/or ROCM packages, for example. The
316
+ "main" package can be installed on all nodes, and the
317
+ accelerator-specific subpackage can be installed on only the nodes
318
+ with accelerator hardware and support libraries.
0 commit comments