|
| 1 | +.. _mpix_comm_agree: |
| 2 | + |
| 3 | +MPIX_Comm_agree |
| 4 | +=============== |
| 5 | +.. include_body |
| 6 | +
|
| 7 | +:ref:`MPIX_Comm_agree`, :ref:`MPIX_Comm_iagree` - Agree on a flag value |
| 8 | +from all live processes and distributes the result back to all live |
| 9 | +processes, even after process failures. |
| 10 | + |
| 11 | +This is part of the User Level Fault Mitigation :ref:`ULFM extension <ulfm-label>`. |
| 12 | + |
| 13 | +SYNTAX |
| 14 | +------ |
| 15 | + |
| 16 | +C Syntax |
| 17 | +^^^^^^^^ |
| 18 | + |
| 19 | +.. code-block:: c |
| 20 | +
|
| 21 | + #include <mpi.h> |
| 22 | + #include <mpi-ext.h> |
| 23 | +
|
| 24 | + int MPIX_Comm_agree(MPI_Comm comm, int *flag) |
| 25 | + |
| 26 | + int MPIX_Comm_iagree(MPI_Comm comm, int *flag, MPI_Request *request) |
| 27 | +
|
| 28 | +
|
| 29 | +Fortran Syntax |
| 30 | +^^^^^^^^^^^^^^ |
| 31 | + |
| 32 | +.. code-block:: fortran |
| 33 | +
|
| 34 | + USE MPI |
| 35 | + USE MPI_EXT |
| 36 | + ! or the older form: INCLUDE 'mpif.h' |
| 37 | +
|
| 38 | + MPIX_COMM_AGREE(COMM, FLAG, IERROR) |
| 39 | + INTEGER COMM, FLAG, IERROR |
| 40 | +
|
| 41 | + MPIX_COMM_IAGREE(COMM, FLAG, REQUEST, IERROR) |
| 42 | + INTEGER COMM, FLAG, REQUEST, IERROR |
| 43 | +
|
| 44 | +
|
| 45 | +Fortran 2008 Syntax |
| 46 | +^^^^^^^^^^^^^^^^^^^ |
| 47 | + |
| 48 | +.. code-block:: fortran |
| 49 | +
|
| 50 | + USE mpi_f08 |
| 51 | + USE mpi_ext_f08 |
| 52 | +
|
| 53 | + MPIX_Comm_agree(comm, flag, ierror) |
| 54 | + TYPE(MPI_Comm), INTENT(IN) :: comm |
| 55 | + INTEGER, INTENT(INOUT) :: flag |
| 56 | + INTEGER, OPTIONAL, INTENT(OUT) :: ierror |
| 57 | +
|
| 58 | + MPIX_COMM_IAGREE(COMM, FLAG, REQUEST, IERROR) |
| 59 | + TYPE(MPI_Comm), INTENT(IN) :: comm |
| 60 | + INTEGER, INTENT(INOUT), ASYNCHRONOUS :: flag |
| 61 | + TYPE(MPI_Request), INTENT(OUT) :: request |
| 62 | + INTEGER, OPTIONAL, INTENT(OUT) :: ierror |
| 63 | +
|
| 64 | +INPUT PARAMETERS |
| 65 | +---------------- |
| 66 | +* ``comm``: Communicator (handle). |
| 67 | +* ``flag``: Binary flags (integer). |
| 68 | + |
| 69 | +OUTPUT PARAMETERS |
| 70 | +----------------- |
| 71 | +* ``flag``: Reduced binary flags (integer). |
| 72 | +* ``request``: Request (handle, non-blocking only). |
| 73 | +* ``ierror``: Fortran only: Error status (integer). |
| 74 | + |
| 75 | +DESCRIPTION |
| 76 | +----------- |
| 77 | + |
| 78 | +This collective communication agrees on the integer value *flag* and |
| 79 | +(implicitly) on the group of failed processes in *comm*. |
| 80 | + |
| 81 | +On completion, all non-failed MPI processes have agreed to set the |
| 82 | +output integer value of *flag* to the result of a *bitwise AND* |
| 83 | +operation over the contributed input values of *flag*. |
| 84 | + |
| 85 | +:ref:`MPIX_Comm_iagree` is the non-blocking variant of :ref:`MPIX_Comm_agree`. |
| 86 | + |
| 87 | +PROCESS FAILURES |
| 88 | +---------------- |
| 89 | + |
| 90 | +When an MPI process fails before contributing to the agree operation, |
| 91 | +the *flag* is computed ignoring its contribution, and the operation |
| 92 | +raises an error of class MPIX_ERR_PROC_FAILED. |
| 93 | + |
| 94 | +When an error of class MPIX_ERR_PROC_FAILED is raised, it is consistently |
| 95 | +raised at all MPI processes in the group(s) of *comm*. |
| 96 | + |
| 97 | +After :ref:`MPIX_Comm_agree` raised an error of class MPIX_ERR_PROC_FAILED, |
| 98 | +the group produced by a subsequent call to :ref:`MPIX_Comm_get_failed` on |
| 99 | +*comm* contains every MPI process that didn't contribute to the |
| 100 | +computation of *flag*. |
| 101 | + |
| 102 | +WHEN THE COMMUNICATOR CONTAINS ACKNOWLEDGED FAILURES |
| 103 | +---------------------------------------------------- |
| 104 | + |
| 105 | +If **all** MPI processes in the group of *comm* have acknowledged the failure |
| 106 | +of an MPI process (using :ref:`MPIX_Comm_ack_failed`) prior to the call to |
| 107 | +:ref:`MPIX_Comm_agree` (or :ref:`MPIX_Comm_iagree`), the MPIX_ERR_PROC_FAILED |
| 108 | +error is not raised when the output value of *flag* ignores the |
| 109 | +contribution of that failed process. Note that this is an uniform property: |
| 110 | +if a non-contributing process is found to be not-acknowledged at any live |
| 111 | +process in *comm*, all processes raise an error of class MPIX_ERR_PROC_FAILED. |
| 112 | + |
| 113 | +**Example 1:** Using a combination of :ref:`MPIX_Comm_ack_failed` and |
| 114 | +:ref:`MPIX_Comm_agree` users can propagate and synchronize the knowledge |
| 115 | +of failures across all MPI processes in *comm*. |
| 116 | + |
| 117 | +.. code-block:: c |
| 118 | +
|
| 119 | + Comm_get_failed_consistent(MPI_Comm c, MPI_Group * g) { |
| 120 | + int rc; int T=1; |
| 121 | + int size; int num_acked; |
| 122 | + MPI_Group gf; |
| 123 | + int ranges[3] = {0, 0, 1}; |
| 124 | +
|
| 125 | + MPI_Comm_size(c, &size); |
| 126 | +
|
| 127 | + do { |
| 128 | + /* this routine is not pure: calling MPI_Comm_ack_failed |
| 129 | + * affects the state of the communicator c */ |
| 130 | + MPIX_Comm_ack_failed(c, size, &num_acked); |
| 131 | + /* we simply ignore the T value in this example */ |
| 132 | + rc = MPIX_Comm_agree(c, &T); |
| 133 | + } while( rc != MPI_SUCCESS ); |
| 134 | + /* after this loop, MPIX_Comm_agree has returned MPI_SUCCESS at |
| 135 | + * all processes, so all processes have Acknowledged the same set of |
| 136 | + * failures. Let's get that set of failures in the g group. */ |
| 137 | + if( 0 == num_acked ) { |
| 138 | + *g = MPI_GROUP_EMPTY; |
| 139 | + } |
| 140 | + else { |
| 141 | + MPIX_Comm_get_failed(c, &gf); |
| 142 | + ranges[1] = num_acked - 1; |
| 143 | + MPI_Group_range_incl(gf, 1, ranges, g); |
| 144 | + MPI_Group_free(&gf); |
| 145 | + } |
| 146 | + } |
| 147 | +
|
| 148 | +WHEN THE COMMUNICATOR IS REVOKED |
| 149 | +-------------------------------- |
| 150 | + |
| 151 | +This function never raises an error of class MPIX_ERR_REVOKED. |
| 152 | +The defined semantics of :ref:`MPIX_Comm_agree` are maintained when *comm* |
| 153 | +is revoked, or when the group of *comm* contains failed MPI processes. |
| 154 | +In particular, :ref:`MPIX_Comm_agree` is a collective operation, even |
| 155 | +when *comm* is revoked. |
| 156 | + |
| 157 | +WHEN COMMUNICATOR IS AN INTER-COMMUNICATOR |
| 158 | +------------------------------------------ |
| 159 | + |
| 160 | +When the communicator is an inter-communicator, the value of *flag* is |
| 161 | +a *bitwise AND* operation over the values contributed by the remote |
| 162 | +group. |
| 163 | + |
| 164 | +When an error of class MPIX_ERR_PROC_FAILED is raised, it is consistently |
| 165 | +raised at all MPI processes in the group(s) of *comm*, that is, both |
| 166 | +the local and remote groups of the inter-communicator. |
| 167 | + |
| 168 | +ERRORS |
| 169 | +------ |
| 170 | + |
| 171 | +.. include:: ./ERRORS.rst |
| 172 | + |
| 173 | +.. seealso:: |
| 174 | + * :ref:`MPIX_Comm_is_revoked` |
| 175 | + * :ref:`MPIX_Comm_ack_failed` |
0 commit comments