Skip to content

EgeeMpiJobs

Adrian Quintana edited this page Dec 11, 2017 · 1 revision

MPI Jobs Creator

The actual state of MPI support over EGEE is not very good, even so we can send Jobs to specified sites that works relative well, at least inside the biomed virtual organization.

You can obtain more information, in the EGEE MPI wiki at grid ireland web site:

http://www.grid.ie/mpi/wiki/FrontPage

Or in the mailing list related:

project-eu-egee-tcg-mpi@cern.ch

We have developed script written in the language python that makes easier the creation of files .jdl and of shell scripts for the execution of jobs using the advantages that interface MPI offers.

From my point of view certain effort on the part of the developers and of the users is becoming so that the support of this technology matures within the EGEE infrastructure.

Invoking the command

The name of script is mpi_jobs_creator and is installed in our machine villon (villon.cnb.uam.es). In order to be able to invoke it is recommendable before to add to the route in path of our file .bashrc

export PATH=/opt/xmipp:$PATH source .bashrc

Once we have done this, we will be able to invoke it without problem. The output that we will obtain will be the following one:

mpi_jobs_creator
error: -virtual_organization|-vo parameter is required
error: -executable|-exe parameter is required (absolut path of your MPI program)
error: -number_CPU|-nCPU parameter is required

[-help|-h] help -virtual_organization|-vo virtual organization -executable|-exe executable [-arguments|-args] arguments -number_CPU|-nCPU number CPU [-input_data|-id] input data [-publish|-p] data catalog publication [-output_data|-od] output data [-result_rule|-rl] [-computing_element|-ce] computing element [-storage_element|-se] storage element [-catalog_in_path| -caip] catalog input path [-catalog_out_path| -caop] catalog output path [-root_name|-o] root name TODO: [-launch|-l] launch job generated retrive output data -> catalog used [-retrive|-r] retrieve output data

Basic use case (cpi)

Generating the files with mpi_jobs_creator

In order to write this script it was taken as it bases a simple example that uses a program to obtain approaches of the number pi, called cpi.

We are going to create the pair necessary of files executing mpi_jobs_creator script.

Firstly I'm going to execute the following command and later I will explain each one of the parameters used.

mpi_jobs_creator -vo biomed -nCPU 8 -exe /home/user/cpi -od test.tgz -rl "test*"
success: the files mpi_job.jdl and mpi_job.sh have been created

If everything has gone well mpi_jobs_creator will have generated the files: mpi_job.jdl and mpi_job.sh

With the parameter - vo, we are indicating the virtual organization to whom we belong so that we pruned to later obtain data on its resources.

(eg:) -vo biomed
With the parameter -nCPU, we are indicating the number of CPU that we want to request.
(eg:) -nCPU 8
With the parameter -exe, we are indicating the absolut path of our mpi executable file.
(eg:) -exe /home/user/cpi
With the parameter -od, we are indicating the name of the output data file compressed in tgz format that have to contain the result files of the execution.
(eg:) -od test.tgz
With the parameter -rl, we are indicating a rule that going to serve to shell script file generated to add in the output file compressed in the format tgz all the occurrences that agree with the given rule.
(eg:) -rl "test*"

Some advices for a successful execution

In order to be able to execute job mpi that we have generated using mpi_jobs_creator a good practice is to consult previously what sites are more suitable for it. This command will return us the list requested.
(eg:) edg-job-list-match mpi_job.jdl
Some of sites given back is not adapted to run our MPI jobs, for that reason often the experience with some of them is valueable.
(eg:) edg-job-list-match mpi_job.jdl
With the following command we will obtain some more information about the sites of our virtual organization. The information that gives back to us includes, among other free things the number of CPU's in the second column.
(eg:) lcg-infosites --vo biomed ce
valor del bdii: lcg-bdii.cern.ch:2170
#CPU    Free    Total Jobs      Running Waiting [[ComputingElement]]

Using combined results we can consult the state of some of the sites given back by eg-job-list-match using the grep command. See the following example.

lcg-infosites --vo biomed ce | grep site.chosen.com:2119/jobmanager-lcgpbs-biomed
Now we can send job to the chosen site, for it we will execute the following command.
edg-job-submit -r site.chosen.com:2119/jobmanager-lcgpbs-biomed mpi_job.jdl
We can check the status of our job using the following command, the input is the id of the job returned in the previous command.
edg-job-status id_of_our_job
When ours job has finalized we can gather the result using the following command.
edg-job-get-output --dir . id_of_our_job
In my case, after observing the output files I have obtained something like that.
Modified mpirun: Executing command: /home/bio058/gram_scratch_OzQpkeIMDL/.mpi/https_3a_2f_2fbioinfo02.pcm.uam.es_3a9000_2fK-7Q0pFzumJWYDtYl_5flyig/cpi
Process 0 of 1 on grid46.lal.in2p3.fr
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 10.004210
Of course, you can use your personal tricks or your expertise with the aim to discover the best ways to execute your applications.

-- Main.GermanCarrera - 11 Apr 2007

Clone this wiki locally