Skip to content

Commit 23a7676

Browse files
committed
docs: update "Getting help"
Revamp the "Getting help" docs page to include a bit more information and have more recent procedures for the Open MPI community. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
1 parent dd7cbdd commit 23a7676

File tree

1 file changed

+166
-143
lines changed

1 file changed

+166
-143
lines changed

docs/getting-help.rst

Lines changed: 166 additions & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -36,24 +36,33 @@ places. If you have:
3636
there.
3737

3838
.. note:: Because of spam, only subscribers to the mailing list are
39-
allowed to post to the mailing list. Specifically: you must
40-
subscribe to the mailing list before posting.
39+
allowed to post to the mailing list. Specifically: **you must
40+
subscribe to the mailing list before posting.**
4141

42-
* If you have a run-time question or problem, see the :ref:`For
43-
run-time problems <getting-help-run-time-label>` section below for
44-
the content of what to include in your email.
4542
* If you have a compile-time question or problem, see the :ref:`For
46-
compile-time problems <getting-help-compile-time-label>` section
43+
problems building or installing Open MPI
44+
<getting-help-compile-time-label>` section below for the content
45+
of what to include in your email.
46+
47+
* If you have problems launching your MPI or OpenSHMEM application
48+
successfully, see the :ref:`For problems launching MPI or
49+
OpenSHMEM applications <getting-help-launching-label>` section
50+
below for the content of what to include in your email.
51+
52+
* If you have other questions or problems about running your MPI or
53+
OpenSHMEM application, see the :ref:`For problems running MPI or
54+
OpenSHMEM applications <getting-help-running-label>` section
4755
below for the content of what to include in your email.
4856

49-
.. note:: The mailing lists have **a 150 KB size limit on
50-
messages** (this is a limitation of the mailing list web
51-
archives). If attaching your files results in an email larger
52-
than this, please try compressing it and/or posting it on the
53-
web somewhere for people to download. A `Github Gist
54-
<https://gist.github.com/>`_ or a `Pastebin
55-
<https://pastebin.com/>`_ might be a good choice for posting
56-
large text files.
57+
.. important:: The more information you include in your report, the
58+
better. E-mails/bug reports simply stating, "It doesn't work!"
59+
are not helpful; we need to know as much information about your
60+
environment as possible in order to provide meaningful
61+
assistance.
62+
63+
**The best way to get help** is to provide a "recipe" for
64+
reproducing the problem. This will allow the Open MPI developers
65+
to see the error for themselves, and therefore be able to fix it.
5766

5867
.. important:: Please **use a descriptive "subject" line in your
5968
email!** Some Open MPI question-answering people decide whether
@@ -75,82 +84,152 @@ places. If you have:
7584
there.
7685

7786
If you're unsure where to send your question, subscribe and send an
78-
email to the user's mailing list.
87+
email to the user's mailing list (i.e., option #1, above).
7988

80-
.. _getting-help-run-time-label:
89+
.. _getting-help-compile-time-label:
8190

82-
For run-time problems
83-
---------------------
91+
For problems building or installing Open MPI
92+
--------------------------------------------
8493

85-
Please provide *all* of the following information:
94+
If you cannot successfully configure, build, or install Open MPI,
95+
please provide *all* of the following information:
8696

87-
.. important:: The more information you include in your report, the
88-
better. E-mails/bug reports simply stating, "It doesn't work!"
89-
are not helpful; we need to know as much information about your
90-
environment as possible in order to provide meaningful assistance.
97+
#. The version of Open MPI that you're using.
9198

92-
**The best way to get help** is to provide a "recipe" for
93-
reproducing the problem. This will allow the Open MPI developers
94-
to see the error for themselves, and therefore be able to fix it.
99+
#. The stdout and stderr from running ``configure``.
95100

96-
#. The version of Open MPI that you're using.
101+
#. All ``config.log`` files from the Open MPI build tree.
102+
103+
#. Output from when you ran ``make V=1 all`` to build Open MPI.
104+
105+
#. Output from when you ran ``make install`` to install Open MPI.
106+
107+
The script below may be helpful to gather much of the above
108+
information (adjust as necessary for your specific environment):
109+
110+
.. code-block:: bash
111+
112+
#!/usr/bin/env bash
113+
114+
set -euxo pipefail
115+
116+
# Make a directory for the output files
117+
dir="`pwd`/ompi-output"
118+
mkdir $dir
119+
120+
# Fill in the options you want to pass to configure here
121+
options=""
122+
./configure $options 2>&1 | tee $dir/config.out
123+
tar -cf - `find . -name config.log` | tar -x -C $dir -
124+
125+
# Build and install Open MPI
126+
make V=1 all 2>&1 | tee $dir/make.out
127+
make install 2>&1 | tee $dir/make-install.out
128+
129+
# Bundle up all of these files into a tarball
130+
filename="ompi-output.tar.bz2"
131+
tar -jcf $filename `basename $dir`
132+
echo "Tarball $filename created"
133+
134+
Then attach the resulting ``ompi-output.tar.bz2`` file to your report.
135+
136+
.. caution:: The mailing lists have **a 150 KB size limit on
137+
messages** (this is a limitation of the mailing list web archives).
138+
If attaching the tarball makes your message larger than 150 KB, you
139+
may need to post the tarball elsewhere and include a link to that
140+
tarball in your mail to the list.
97141

98-
#. The ``config.log`` file from the top-level Open MPI directory, if
99-
available (**compress or post to a Github gist or Pastebin**).
142+
.. _getting-help-launching-label:
143+
144+
For problems launching MPI or OpenSHMEM applications
145+
----------------------------------------------------
146+
147+
If you cannot successfully launch simple applications across multiple
148+
nodes (e.g., the non-MPI ``hostname`` command, or the MPI "hello world"
149+
or "ring" sample applications in the ``examples/`` directory), please
150+
provide *all* of the information from the :ref:`For problems building
151+
or installing Open MPI <getting-help-compile-time-label>` section, and
152+
*all* of the following additional information:
100153

101154
#. The output of the ``ompi_info --all`` command from the node where
102-
you're invoking ``mpirun``.
103-
104-
#. If you have questions or problems about process affinity /
105-
binding, send the output from running the ``lstopo -v``
106-
command from a recent version of `Hwloc
107-
<https://www.open-mpi.org/projects/hwloc/>`_. *The detailed
108-
text output is preferable to a graphical output.*
109-
110-
#. If running on more than one node |mdash| especially if you're
111-
having problems launching Open MPI processes |mdash| also include
112-
the output of the ``ompi_info --version`` command **from each node
113-
on which you're trying to run**.
114-
115-
#. If you are able to launch MPI processes, you can use
116-
``mpirun`` to gather this information. For example, if
117-
the file ``my_hostfile.txt`` contains the hostnames of the
118-
machines on which you are trying to run Open MPI
119-
processes::
120-
121-
shell$ mpirun --map-by node --hostfile my_hostfile.txt --output tag ompi_info --version
122-
123-
124-
#. If you cannot launch MPI processes, use some other mechanism
125-
|mdash| such as ``ssh`` |mdash| to gather this information. For
126-
example, if the file ``my_hostfile.txt`` contains the hostnames
127-
of the machines on which you are trying to run Open MPI
128-
processes:
129-
130-
.. code-block:: sh
131-
132-
# Bourne-style shell (e.g., bash, zsh, sh)
133-
shell$ for h in `cat my_hostfile.txt`
134-
> do
135-
> echo "=== Hostname: $h"
136-
> ssh $h ompi_info --version
137-
> done
138-
139-
.. code-block:: sh
140-
141-
# C-style shell (e.g., csh, tcsh)
142-
shell% foreach h (`cat my_hostfile.txt`)
143-
foreach? echo "=== Hostname: $h"
144-
foreach? ssh $h ompi_info --version
145-
foreach? end
146-
147-
#. A *detailed* description of what is failing. The more
148-
details that you provide, the better. E-mails saying "My
149-
application doesn't work!" will inevitably be answered with
150-
requests for more information about *exactly what doesn't
151-
work*; so please include as much information detailed in your
152-
initial e-mail as possible. We strongly recommend that you
153-
include the following information:
155+
you are invoking :ref:`mpirun(1) <man1-mpirun>`.
156+
157+
#. If you have questions or problems about process mapping or binding,
158+
send the output from running the ``lstopo -v`` and ``lstopo --of
159+
xml`` commands from a recent version of `Hwloc
160+
<https://www.open-mpi.org/projects/hwloc/>`_.
161+
162+
#. If running on more than one node, also include the output of the
163+
``ompi_info --version`` command **from each node on which you are
164+
trying to run**.
165+
166+
#. The output of running ``mpirun --map-by ppr:1:node --prtemca
167+
plm_base_verbose 100 --prtemca rmaps_base_verbose 100 --display
168+
alloc hostname``. Add in a ``--hostfile`` argument if needed for
169+
your environment.
170+
171+
The script below may be helpful to gather much of the above
172+
information (adjust as necessary for your specific environment).
173+
174+
.. note:: It is safe to run this script after running the script from
175+
the :ref:`building and installing
176+
<getting-help-compile-time-label>` section.
177+
178+
.. code-block:: bash
179+
180+
#!/usr/bin/env bash
181+
182+
set -euxo pipefail
183+
184+
# Make a directory for the output files
185+
dir="`pwd`/ompi-output"
186+
mkdir -p $dir
187+
188+
# Get installation and system information
189+
ompi_info --all 2>&1 | tee $dir/ompi-info-all.out
190+
lstopo -v | tee $dir/lstopo-v.txt
191+
lstopo --of xml | tee $dir/lstopo.xml
192+
193+
# Have a text file "my_hostfile.txt" containing the hostnames on
194+
# which you are trying to launch
195+
for host in `cat my_hostfile.txt`; do
196+
ssh $host ompi_info --version 2>&1 | tee $dir/ompi_info-version-$host.out
197+
ssh $host lstopo -v | tee $dir/lstopo-v-$host.txt
198+
ssh $host lstopo --of xml | tee $dir/lstopo-$host.xml
199+
done
200+
201+
# Have a my_hostfile.txt file if needed for your environment, or
202+
# remove the --hostfile argument altogether if not needed.
203+
set +e
204+
mpirun \
205+
--hostfile my_hostfile.txt \
206+
--map-by ppr:1:node \
207+
--prtemca plm_base_verbose 100 \
208+
--prtemca rmaps_base_verbose 100 \
209+
--display alloc \
210+
hostname 2>&1 | tee $dir/mpirun-hostname.out
211+
212+
# Bundle up all of these files into a tarball
213+
filename="ompi-output.tar.bz2"
214+
tar -jcf $filename `basename $dir`
215+
echo "Tarball $filename created"
216+
217+
.. _getting-help-running-label:
218+
219+
For problems running MPI or OpenSHMEM applications
220+
--------------------------------------------------
221+
222+
If you can successfully launch parallel MPI or OpenSHMEM applications,
223+
but the jobs fail during the run, please provide *all* of the
224+
information from the :ref:`For problems building or installing Open
225+
MPI <getting-help-compile-time-label>` section, *all* of the
226+
information from the :ref:`For problems launching MPI or OpenSHMEM
227+
applications <getting-help-launching-label>` section, and then *all*
228+
of the following additional information:
229+
230+
#. A *detailed* description of what is failing. *The more details
231+
that you provide, the better.* Please include at least the
232+
following information:
154233

155234
* The exact command used to run your application.
156235

@@ -164,77 +243,21 @@ Please provide *all* of the following information:
164243
any required support libraries, such as libraries required
165244
for high-speed networks such as InfiniBand).
166245

167-
#. Detailed information about your network:
246+
#. The source code of a short sample program (preferably in C or
247+
Fortran) that exhibits the problem.
248+
249+
#. If you are experiencing networking problems, include detailed
250+
information about your network.
168251

169252
.. error:: TODO Update link to IB FAQ entry.
170253

171254
#. For RoCE- or InfiniBand-based networks, include the information
172255
:ref:`in this FAQ entry <faq-ib-troubleshoot-label>`.
173256

174-
#. For Ethernet-based networks (including RoCE-based networks,
257+
#. For Ethernet-based networks (including RoCE-based networks),
175258
include the output of the ``ip addr`` command (or the legacy
176259
``ifconfig`` command) on all relevant nodes.
177260

178261
.. note:: Some Linux distributions do not put ``ip`` or
179262
``ifconfig`` in the default ``PATH`` of normal users.
180263
Try looking for it in ``/sbin`` or ``/usr/sbin``.
181-
182-
.. _getting-help-compile-time-label:
183-
184-
For compile problems
185-
--------------------
186-
187-
Please provide *all* of the following information:
188-
189-
.. important:: The more information you include in your report, the
190-
better. E-mails/bug reports simply stating, "It doesn't work!"
191-
are not helpful; we need to know as much information about your
192-
environment as possible in order to provide meaningful assistance.
193-
194-
**The best way to get help** is to provide a "recipe" for
195-
reproducing the problem. This will allow the Open MPI developers
196-
to see the error for themselves, and therefore be able to fix it.
197-
198-
#. The version of Open MPI that you're using.
199-
200-
#. All output (both compilation output and run time output, including
201-
all error messages).
202-
203-
#. Output from when you ran ``./configure`` to configure Open MPI
204-
(**compress or post to a GitHub gist or Pastebin!**).
205-
206-
#. The ``config.log`` file from the top-level Open MPI directory
207-
(**compress or post to a GitHub gist or Pastebin!**).
208-
209-
#. Output from when you ran ``make V=1`` to build Open MPI (**compress
210-
or post to a GitHub gist or Pastebin!**).
211-
212-
#. Output from when you ran ``make install`` to install Open MPI
213-
(**compress or post to a GitHub gist or Pastebin!**).
214-
215-
To capture the output of the configure and make steps, you can use the
216-
script command or the following technique to capture all the files in
217-
a unique directory, suitable for tarring and compressing into a single
218-
file:
219-
220-
.. code-block:: sh
221-
222-
# Bourne-style shell (e.g., bash, zsh, sh)
223-
shell$ mkdir $HOME/ompi-output
224-
shell$ ./configure {options} 2>&1 | tee $HOME/ompi-output/config.out
225-
shell$ make all 2>&1 | tee $HOME/ompi-output/make.out
226-
shell$ make install 2>&1 | tee $HOME/ompi-output/make-install.out
227-
shell$ cd $HOME
228-
shell$ tar jcvf ompi-output.tar.bz2 ompi-output
229-
230-
.. code-block:: sh
231-
232-
# C-style shell (e.g., csh, tcsh)
233-
shell% mkdir $HOME/ompi-output
234-
shell% ./configure {options} |& tee $HOME/ompi-output/config.out
235-
shell% make all |& tee $HOME/ompi-output/make.out
236-
shell% make install |& tee $HOME/ompi-output/make-install.out
237-
shell% cd $HOME
238-
shell% tar jcvf ompi-output.tar.bz2 ompi-output
239-
240-
Then attach the resulting ``ompi-output.tar.bz2`` file to your report.

0 commit comments

Comments
 (0)