-
Notifications
You must be signed in to change notification settings - Fork 906
RFC: Provide equivalence of MPICH_ASYNC_PROGRESS #13088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,6 +39,7 @@ same command). | |
prerequisites | ||
pmix-and-prrte | ||
scheduling | ||
progress_thread | ||
|
||
localhost | ||
ssh | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
.. _async-progress-thread-label: | ||
|
||
Asynchronous progress thread | ||
============================ | ||
|
||
OpenMPI provides an experimental support of software-based asynchronous | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Open MPI" (with a space, without the quotes, obviously). Same correction is needed for the other places in this PR that the docs mention "OpenMPI". Sorry, I know this is a nit, but this is the official Open MPI docs and we need to get the name right. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem. I understand. |
||
progress thread. This progress thread is in charge of running internal | ||
progression engine in the background to advance non-blocking overlapping | ||
communication. | ||
|
||
Enabling progress thread at configuration time | ||
---------------------------------------------- | ||
|
||
The feature is can be enabled or disabled at configuration by passing | ||
``--enable-progress-threads`` or ``--disable-progress-threads`` to | ||
``configure``. The default state is enabled. | ||
|
||
Enabling progress thread at runtime | ||
----------------------------------- | ||
|
||
When OpenMPI was built with ``--enable-progress-threads``, the progress | ||
thread is still deactivated at runtime by default. | ||
|
||
The progress thread can be activated by setting one of the following | ||
environment variables to any non-zero value in the launching command: | ||
|
||
.. code-block:: sh | ||
|
||
shell$ OMPI_ASYNC_PROGRESS=1 mpirun ... | ||
shell$ OPAL_ASYNC_PROGRESS=7 mpirun ... | ||
|
||
When both variables are set, ``OMPI_ASYNC_PROGRESS`` will be preferred to | ||
``OPAL_ASYNC_PROGRESS``. | ||
|
||
.. warning:: Activating progress thread at runtime may bring more harm | ||
than improvement to performance. User should try it with caution. | ||
Comment on lines
+35
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd soften this. Maybe: .. warning:: Progress threads are a somewhat complicated issue. Activating them at
run time may improve overlap of communication and computation in
your application (particularly those with non-blocking communication)
which will improve overall performance. But there may be unintended
consequences which may degrade overall application performance.
Users are advised to experiment and see what works best for their
applications. |
||
|
||
Rationale | ||
--------- | ||
|
||
A possible beneficial usecase of software progress thread is *intra-node | ||
shared-mem non-blocking* communication, running on some high core-count CPUs, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shared-memory |
||
on which application may not use all the available cores, or the CPU has some | ||
reserved cores dedicated to communication tasks. In such configuration, the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In such configurations, ... |
||
latency of some non-blocking collective operations (e.g. ``MPI_Ireduce()``) | ||
can be improved thanks to arithmetic operations being performed in the | ||
background by the progress thread, instead of the main thread which would only | ||
perform computation lastly at the moment of ``MPI_Wait()``, as currently | ||
implemented in OpenMPI. | ||
Comment on lines
+47
to
+49
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ...background by the progress thread, instead of deferring the computations to being executed by the main thread during |
||
|
||
On another hand, on systems where *inter-node communication* was already | ||
offloaded to an appropriate BTL backend and associated hardware devices, | ||
enabling the software-based progress thread could be harmful to performance, | ||
since the additional thread will take up more CPU time for mostly no utility. | ||
Comment on lines
+51
to
+54
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively, on systems where inter-node communications are already offloaded to dedicated hardware, enabling the software-based progress threads could degrade performance, since the additional thread will force progress up through the CPU and potentially away from more optimized hardware functionality. |
||
|
||
For these performance reasons, the progress thread is not activated (spawned) | ||
by default at runtime. It is upon developers to decide to switch on the | ||
progress thread, depending on their application and system setup. | ||
|
||
Limitations | ||
----------- | ||
|
||
#. The current implementation does not support (yet) binding the progress | ||
thread to a specific core (or set of cores). | ||
|
||
#. There are still some hard-coded constant parameters in the code that | ||
would require further tuning. | ||
|
||
#. It was observed that some multi-threading overhead may impact performance | ||
on small buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a proper MCA parameter rather than naked env variables? MCA parameters are The Open MPI Way -- they can be set in a variety of ways (including via env variables).