Skip to content

RFC: Provide equivalence of MPICH_ASYNC_PROGRESS #13088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/installing-open-mpi/configure-cli-options/misc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,18 @@ above categories that can be used with ``configure``:
.. danger:: The heterogeneous functionality is currently broken |mdash|
do not use.

* ``--enable-progress-threads``
* ``--disable-progress-threads``:
Enable or disable (default = ``enabled``) software-based progress thread
for each MPI process to execute the internal communication progression
engine. When enabled at build time, the thread can be activated (default =
deactivated) at runtime by setting environment variable
``OMPI_ASYNC_PROGRESS`` or ``OPAL_ASYNC_PROGRESS`` to a non-zero value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a proper MCA parameter rather than naked env variables? MCA parameters are The Open MPI Way -- they can be set in a variety of ways (including via env variables).


.. warning:: Be aware of performance degradation. Please read
:ref:`this section <async-progress-thread-label>` for
more documentation.

.. _install-wrapper-flags-label:

* ``--with-wrapper-cflags=CFLAGS``
Expand Down
1 change: 1 addition & 0 deletions docs/launching-apps/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ same command).
prerequisites
pmix-and-prrte
scheduling
progress_thread

localhost
ssh
Expand Down
70 changes: 70 additions & 0 deletions docs/launching-apps/progress_thread.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
.. _async-progress-thread-label:

Asynchronous progress thread
============================

OpenMPI provides an experimental support of software-based asynchronous
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Open MPI" (with a space, without the quotes, obviously). Same correction is needed for the other places in this PR that the docs mention "OpenMPI".

Sorry, I know this is a nit, but this is the official Open MPI docs and we need to get the name right.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I understand.

progress thread. This progress thread is in charge of running internal
progression engine in the background to advance non-blocking overlapping
communication.

Enabling progress thread at configuration time
----------------------------------------------

The feature is can be enabled or disabled at configuration by passing
``--enable-progress-threads`` or ``--disable-progress-threads`` to
``configure``. The default state is enabled.

Enabling progress thread at runtime
-----------------------------------

When OpenMPI was built with ``--enable-progress-threads``, the progress
thread is still deactivated at runtime by default.

The progress thread can be activated by setting one of the following
environment variables to any non-zero value in the launching command:

.. code-block:: sh

shell$ OMPI_ASYNC_PROGRESS=1 mpirun ...
shell$ OPAL_ASYNC_PROGRESS=7 mpirun ...

When both variables are set, ``OMPI_ASYNC_PROGRESS`` will be preferred to
``OPAL_ASYNC_PROGRESS``.

.. warning:: Activating progress thread at runtime may bring more harm
than improvement to performance. User should try it with caution.
Comment on lines +35 to +36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd soften this. Maybe:

.. warning:: Progress threads are a somewhat complicated issue.  Activating them at 
             run time may improve overlap of communication and computation in 
             your application (particularly those with non-blocking communication) 
             which will improve overall performance.  But there may be unintended
             consequences which may degrade overall application performance.
             Users are advised to experiment and see what works best for their 
             applications.


Rationale
---------

A possible beneficial usecase of software progress thread is *intra-node
shared-mem non-blocking* communication, running on some high core-count CPUs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared-memory

on which application may not use all the available cores, or the CPU has some
reserved cores dedicated to communication tasks. In such configuration, the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such configurations, ...

latency of some non-blocking collective operations (e.g. ``MPI_Ireduce()``)
can be improved thanks to arithmetic operations being performed in the
background by the progress thread, instead of the main thread which would only
perform computation lastly at the moment of ``MPI_Wait()``, as currently
implemented in OpenMPI.
Comment on lines +47 to +49
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...background by the progress thread, instead of deferring the computations to being executed by the main thread during MPI_Wait().


On another hand, on systems where *inter-node communication* was already
offloaded to an appropriate BTL backend and associated hardware devices,
enabling the software-based progress thread could be harmful to performance,
since the additional thread will take up more CPU time for mostly no utility.
Comment on lines +51 to +54
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, on systems where inter-node communications are already offloaded to dedicated hardware, enabling the software-based progress threads could degrade performance, since the additional thread will force progress up through the CPU and potentially away from more optimized hardware functionality.


For these performance reasons, the progress thread is not activated (spawned)
by default at runtime. It is upon developers to decide to switch on the
progress thread, depending on their application and system setup.

Limitations
-----------

#. The current implementation does not support (yet) binding the progress
thread to a specific core (or set of cores).

#. There are still some hard-coded constant parameters in the code that
would require further tuning.

#. It was observed that some multi-threading overhead may impact performance
on small buffers.