Skip to content

ParallelPage

Adrian Quintana edited this page Dec 11, 2017 · 1 revision

Protocol paralellization (for Xmipp 2.3 version or better, this page is NOT backwards compatible)

Several protocols may be run in parallel on a wide range of multi-processor environments. In theory, only the number of parallel jobs and a list with the CPUs is needed to execute the programs, but unfortunately the way this information is required by different clusters varies dramatically. Therefore, two steps have to be performed:

Fill in the fields related to parallelization in the protocol GUI

/mpi.jpg
Figure 1: Paralelization issues: MPI related questions
One has to fill these fields as follows:
  • SetNumber of threads to 1 unless you know what is a thread, and which programs use them. Note that threaded programs may be run in parallel on a shared-memory multi-core machine without using distributed-memory parallelization.
  • SetDistributed-memory parallelization (MPI)? toYes (Otherwise what follows will be ignored!)
  • SetNumber of MPI processes to thenumber of CPUs you want to use /Number of threads
  • SetSystem Flavour depending on your queueing system and MPI-implementation. The following values are available:
    • SLURM-MPICH: SLURM queue with MPICH-implementation
    • TORQUE-OPENMPI: Torque (PBS) queue with openMPI-implementation
    • SGE-OPENMPI: Sun Grid Engine with openMPI-implementation
    • PBS: Basic PBS queue
    • XMIPP_MACHINEFILE: Environment variable $XMIPP_MACHINEFILE points to machinefile
    • HOME_MACHINEFILE: machinefile is called $HOME/machinefile.dat
    • Leave it black: Run locally (for most personal computers)

If you are in doubt about theSystem Flavour, ask the person who installed Xmipp on your cluster, or read the source code of launch_job.py which is the class that launches parallel jobs in the protocols.

Submit your protocol to your queuing system

Apart from filling in the protocol GUI fields, if your cluster uses a queueing system it may be necessary to write a dedicated job submission script (we call this scriptqsub.py, and generally place it in a position in the $PATH of the user). When submitting the job by pressing theSave & Execute button on the protocol GUI, one has to answerYes to the question whether one wants to use a job queueing system, and use thisqsub.py command in the pop-up window (see Figure 2).

If you don't know how to write such a script, ask the person who installed Xmipp for you, or have a look at the examples below (e.g. Crunchy,MareNostrum).

/submit.jpg
Figure 2: Parallel job submission using a queueing system

Examples for our clusters

Unix system without queue system

The following three options are available:

  • XMIPP_MACHINEFILE:environment variable $XMIPP_MACHINEFILE points to machinefile
  • HOME_MACHINEFILE: machinefile is called $HOME/machines.dat
  • nothing, all mpi jobs run in localhost

Unix system with simple queue system

Jumilla

  • DefineXMIPP_MACHINEFILE as~biologia/machines.dat (export XMIPP_MACHINEFILE~biologia/machines.dat=)
  • Setsystem flavour toXMIPP_MACHINEFILE
  • In queue submit pop-up window, type:bsub -q 1week_parallel

Supercomputers

Mainframes are very picky regarding how to send jobs. In the more popular ones we have installed a launching script calledqsub.py. This script is different in the different machines but present the same syntax to the user.

Crunchy

Crunchy is our new (IBM) cluster. It has 28 nodes, each with 8 cores and at least 2Gb per core (some nodes have 4Gb/core). Xmipp is running using openMPI and there is a Torque/Moab resource managing system.

  • Setsystem flavour toTORQUE-OPENMPI
  • Place the followingQsubPyCrunchy in the $PATH of the user, and execute the job as given in Figure 2.

If you want to modify the default values see more details atRunningXmippOnCrunchy.

MareNostrum

  • Setsystem flavour toSLURM-MPICH
  • Use thisQsubPyBsc

Vermeer

  • The graphic interface is not suported in this computer, therefore you must edit the python scripts manually and execute them form the command line. For example, to execute theProjection Matching protocol you must edit theprotocol_projmatch.py file.
  • Setsystem flavour toPBS
  • As the GUI is not installed yet, edit your own PBS script. For an example seePbsScript. If you need more memory you may ask for a node but only use one of the two available cpuExample2PBS
  • Submit this script from the command line, using:qsub example.pbs

More expert tips regardingTipsVermeer.

trueno.csic.es

  • Setsystem flavour toPBS .
  • The graphic interface is not supported in this computer, therefore you must edit the python scripts manually and execute them form the command line. For example, to execute the Projection Matching protocol you must edit the protocol_projmatch.py file.
  • Available queues
    • exe-x86_64 4 cpus per node and 16G memory. Example of pbs filePbsTrueno for x86_64
    • exe-ia64 20? cpus in 1 node and 64G memory. Example of pbs filePbsTruenoIa64 for ia64

NOTE x86_64 and ia64 uses different binaries.

PbsTipsMoreInfo

finis terrae

If you want to modify the deault parameter check

Finisterrae only accepts connections from the academic network and not from home. To overcome this problem you may tunnel your ssh connections.

(16 cpus per node and 142 nodes, nax memory per node up to 112)

See a description here.

PBsTipsFinisterrae


ssh -N -f -L 2123:ft.cesga.es:22 USERNAMEJUMILLA@jumilla.cnb.csic.es
scp  -P 2123 USERNAME_FINISTERRAE@localhost:/home/csic/eda/msp/Adeno/*py .


A couple of links related with finisterrae

BlueGene in Martinsried (genius1)

SeeRunningXmippOnBlueGeneMartinsried

Computer comparison

We have done a limited comparison of speed of some of our clusters. The summary of the results is availableComputerComparison

USER's COMMENTS

if you want to define an environmental variable with your machinefile for mpi use
export OMPI_MCA_orte_default_hostfile=/home/roberto/machinefile.dat
Main.RobertoMarabini 2012-10-09 - 15:41

--Main.RobertoMarabini - 08 Oct 2007

Clone this wiki locally