|
| 1 | +Using Memchecker |
| 2 | +================ |
| 3 | + |
| 4 | +The Memchecker functionality in Open MPI provides MPI semantic |
| 5 | +checking for your application (as well as internals of Open MPI), with |
| 6 | +the help of memory checking tools such as the ``memcheck`` component of |
| 7 | +`the Valgrind suite <https://www.valgrind.org/>`_. |
| 8 | + |
| 9 | +///////////////////////////////////////////////////////////////////////// |
| 10 | + |
| 11 | +Types of Errors Detected by Memchecker |
| 12 | +-------------------------------------- |
| 13 | + |
| 14 | +Open MPI's Memchecker is based on the ``memcheck`` tool included with |
| 15 | +Valgrind, so it takes all the advantages from it. Firstly, it checks |
| 16 | +all reads and writes of memory, and intercepts calls to |
| 17 | +``malloc(3)``/``free(3)`` and C++'s ``new``/``delete`` operators. |
| 18 | +Most importantly, Memchecker is able to detect |
| 19 | +the user buffer errors in both non-blocking and one-sided |
| 20 | +communications, e.g. reading or writing to buffers of active |
| 21 | +non-blocking receive operations and writing to buffers of active |
| 22 | +non-blocking send operations. |
| 23 | + |
| 24 | +Here are some example problems that Memchecker can detect: |
| 25 | + |
| 26 | +Accessing buffer under control of non-blocking communication: |
| 27 | + |
| 28 | +.. code-block:: c |
| 29 | +
|
| 30 | + int buf; |
| 31 | + MPI_Irecv(&buf, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); |
| 32 | + // The following line will produce a memchecker warning |
| 33 | + buf = 4711; |
| 34 | + MPI_Wait (&req, &status); |
| 35 | +
|
| 36 | +Wrong input parameters, e.g., wrong-sized send buffers: |
| 37 | + |
| 38 | +.. code-block:: c |
| 39 | +
|
| 40 | + char *send_buffer; |
| 41 | + send_buffer = malloc(5); |
| 42 | + memset(send_buffer, 0, 5); |
| 43 | + // The following line will produce a memchecker warning |
| 44 | + MPI_Send(send_buffer, 10, MPI_CHAR, 1, 0, MPI_COMM_WORLD); |
| 45 | +
|
| 46 | +Accessing a window in a one-sided communication: |
| 47 | + |
| 48 | +.. code-block:: c |
| 49 | +
|
| 50 | + MPI_Get(A, 10, MPI_INT, 1, 0, 1, MPI_INT, win); |
| 51 | + A[0] = 4711; |
| 52 | + MPI_Win_fence(0, win); |
| 53 | +
|
| 54 | +Uninitialized input buffers: |
| 55 | + |
| 56 | +.. code-block:: c |
| 57 | +
|
| 58 | + char *buffer; |
| 59 | + buffer = malloc(10); |
| 60 | + // The following line will produce a memchecker warning |
| 61 | + MPI_Send(buffer, 10, MPI_INT, 1, 0, MPI_COMM_WORLD); |
| 62 | +
|
| 63 | +Usage of the uninitialized ``MPI_Status`` field in ``MPI_ERROR`` |
| 64 | +structure: (the MPI-1 standard defines the ``MPI ERROR`` field to be |
| 65 | +undefined for single-completion calls such as :ref:`MPI_Wait(3) <mpi_wait>` or |
| 66 | +:ref:`MPI_Test(3) <mpi_test>`, see MPI-1 p. 22): |
| 67 | + |
| 68 | +.. code-block:: c |
| 69 | +
|
| 70 | + MPI_Wait(&request, &status); |
| 71 | + // The following line will produce a memchecker warning |
| 72 | + if (status.MPI_ERROR != MPI_SUCCESS) |
| 73 | + return ERROR; |
| 74 | +
|
| 75 | +///////////////////////////////////////////////////////////////////////// |
| 76 | + |
| 77 | +Building Open MPI with Memchecker Support |
| 78 | +----------------------------------------- |
| 79 | + |
| 80 | +To use Memchecker, you need Valgrind 3.2.0 or later, and have an Open |
| 81 | +MPI that was configured with the ``--enable-memchecker`` and |
| 82 | +``--enable-debug`` flags. |
| 83 | + |
| 84 | +.. note:: The Memchecker functionality is off by default, because it |
| 85 | + incurs a performance penalty. |
| 86 | + |
| 87 | +When ``--enable-memchecker`` is specified, ``configure`` will check |
| 88 | +for a recent-enable valgrind distribution. If found, Open MPI will |
| 89 | +build Memchecker support. |
| 90 | + |
| 91 | +For example: |
| 92 | + |
| 93 | +.. code-block:: sh |
| 94 | +
|
| 95 | + shell$ ./configure --prefix=/path/to/openmpi --enable-debug \ |
| 96 | + --enable-memchecker --with-valgrind=/path/to/valgrind |
| 97 | +
|
| 98 | +You can check that Open MPI was built with Memchecker support by using |
| 99 | +the :ref:`ompi_info(1) <man1-ompi_info>` command. |
| 100 | + |
| 101 | +.. code-block:: sh |
| 102 | +
|
| 103 | + # The exact version numbers shown may be different for your Open |
| 104 | + # MPI installation |
| 105 | + shell$ ompi_info | grep memchecker |
| 106 | + MCA memchecker: valgrind (MCA v1.0, API v1.0, Component v1.3) |
| 107 | +
|
| 108 | +If you do not see the "MCA memchecker: valgrind" line, you probably |
| 109 | +did not configure and install Open MPI correctly. |
| 110 | + |
| 111 | +///////////////////////////////////////////////////////////////////////// |
| 112 | + |
| 113 | +Running an Open MPI Application with Memchecker |
| 114 | +----------------------------------------------- |
| 115 | + |
| 116 | +After Open MPI was built and installed with Memchecker support, |
| 117 | +simply run your application with Valgrind, e.g.: |
| 118 | + |
| 119 | +.. code-block:: sh |
| 120 | +
|
| 121 | + shell$ mpirun -n 2 valgrind ./my_app |
| 122 | +
|
| 123 | +If you enabled Memchecker, but you don't want to check the |
| 124 | +application at this time, then just run your application as |
| 125 | +usual. E.g.: |
| 126 | + |
| 127 | +.. code-block:: sh |
| 128 | +
|
| 129 | + shell$ mpirun -n 2 ./my_app |
| 130 | +
|
| 131 | +///////////////////////////////////////////////////////////////////////// |
| 132 | + |
| 133 | +Application Performance Impacts Using Memchecker |
| 134 | +------------------------------------------------ |
| 135 | + |
| 136 | +The configure option ``--enable-memchecker`` (together with |
| 137 | +``--enable-debug``) *does* cause performance degradation, even if not |
| 138 | +running under Valgrind. The following explains the mechanism and may |
| 139 | +help in making the decision whether to provide a cluster-wide |
| 140 | +installation with ``--enable-memchecker``. |
| 141 | + |
| 142 | +There are two cases: |
| 143 | + |
| 144 | +#. If run without Valgrind, the Valgrind ClientRequests (assembler |
| 145 | + instructions added to the normal execution path for checking) do |
| 146 | + not affect overall MPI performance. Valgrind ClientRequests are |
| 147 | + explained in detail `in Valgrind's documentation |
| 148 | + <https://valgrind.org/docs/manual/manual-core-adv.html>`_. |
| 149 | + In the case of x86-64, ClientRequests boil down to the following |
| 150 | + four rotate-left (ROL) and one xchange (XCHG) assembler instructions |
| 151 | + from ``valgrind.h``: |
| 152 | + |
| 153 | + .. code-block:: c |
| 154 | +
|
| 155 | + #define __SPECIAL_INSTRUCTION_PREAMBLE \ |
| 156 | + "rolq \$3, %%rdi; rolq \$13, %%rdi\\n\\t" \ |
| 157 | + "rolq \$61, %%rdi; rolq \$51, %%rdi\\n\\t" |
| 158 | +
|
| 159 | + and |
| 160 | + |
| 161 | + .. We do not make the code block below as "c" because the Sphinx C |
| 162 | + syntax highlighter fails to parse it as C and emits a warning. |
| 163 | + So we might as well just leave it as a plan verbatim block |
| 164 | + (i.e., not syntax highlighted). |
| 165 | +
|
| 166 | + .. code-block:: |
| 167 | +
|
| 168 | + __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE \ |
| 169 | + /* %RDX = client_request ( %RAX ) */ \ |
| 170 | + "xchgq %%rbx,%%rbx" \ |
| 171 | + : "=d" (_zzq_result) \ |
| 172 | + : "a" (& _zzq_args``0``), "0" (_zzq_default) \ |
| 173 | + : "cc", "memory" \ |
| 174 | + ); |
| 175 | +
|
| 176 | + for every single ClientRequest. In the case of not running |
| 177 | + Valgrind, these ClientRequest instructions do not change the |
| 178 | + arithmetic outcome (rotating a 64-bit register left by 128-Bits, |
| 179 | + exchanging a register with itself), except for the carry flag. |
| 180 | + |
| 181 | + The first request is checking whether we're running under Valgrind. |
| 182 | + In case we're not running under Valgrind subsequent checks (a.k.a. |
| 183 | + ClientRequests) are not done. |
| 184 | + |
| 185 | +#. If the application is run under Valgrind, performance is naturally reduced due |
| 186 | + to the Valgrind JIT and the checking tool employed. |
| 187 | + For costs and overheads of Valgrind's Memcheck tool on the SPEC 2000 Benchmark, |
| 188 | + please see the excellent paper |
| 189 | + `Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation |
| 190 | + <https://valgrind.org/docs/valgrind2007.pdf>`_. |
| 191 | + For an evaluation of various internal implementation alternatives of Shadow Memory, please see |
| 192 | + `Building Workload Characterization Tools with Valgrind |
| 193 | + <https://valgrind.org/docs/iiswc2006.pdf>`_. |
0 commit comments