-
Notifications
You must be signed in to change notification settings - Fork 1
Building Your Own Cluster and Configuring It
This guide will show you how to setup three nodes (e.g. Virtual Machines for testing purposes) running Ubuntu 18.04 and configure them for Open MPI, OpenPBS and installation of Fiji with parallel macro and OpenMPI Ops support.
To mimic the HPC installation, we will need to
- Install Virtual Machines (VMs) and configure their network
- Setup shared storage
- Setup remote access
- Setup Open MPI with OpenPBS support
- Setup OpenPBS
- Install Fiji with Parallel Macro Plugins
ℹ️ We also provide some additional useful information to help you reduce the time needed to set up the virtual cluster.
- Install Ubuntu 18.04. We will use user
fiji
as a default user. - Update the system
sudo apt-get update
sudo apt-get upgrade
- Configure a local network for all three nodes
- For the network we will use private range
192.168.1.xxx/24
, where the first node will have the hostname fiji-vm1 and IP address192.168.1.1
, fiji-vm2 with192.168.1.2
, etc. - If you are using VirtualBox, leave the first network adapter in NAT to have the access to Internet and add the second (Intenal Network) interface to the private network shared among the virtual hosts.
- For the network we will use private range
To provide correct node names mapping, add the following lines to /etc/hosts
192.168.1.1 fiji-vm1
192.168.1.2 fiji-vm2
192.168.1.3 fiji-vm3
For sharing the application and data files, we will use NFS (Network File System)
-
Install the NFS server on the first node fiji-vm1
-
sudo apt install nfs-server
-
Create a shared directory
sudo mkdir /mnt/shared
and set proper ownersudo chown fiji.fiji /mnt/shared
. -
We can also create a shared working directory there with
mkdir /mnt/shared/work
-
In
/etc/exports
add the following line giving the access to the second and third node to this shared directory/mnt/shared fiji-vm2(rw,sync) fiji-vm3(rw,sync)
-
-
Install the NFS client on remaining nodes fiji-vm2 and fiji-vm3
-
sudo apt install nfs-common
-
Create automatic mount for the shared directory on fiji-vm2 and fiji-vm3. In
/etc/fstab
add the following linefiji-vm1:/mnt/shared /mnt/shared nfs defaults
-
After the restart, you should see mounted shared /mnt/shared
on all nodes.
To access the nodes remotely, install the SSH server and distribute the fiji-vm1 key to fiji-vm2 and fiji-vm3.
- Install the SSH server on all nodes
sudo apt install openssh-server
- On fiji-vm1 generate the keys using
ssh-keygen
- Append the content of
/home/fiji/.ssh/id_rsa.pub
on fiji-vm1 to/home/fiji/.ssh/authorized_keys
on fiji-vm2 and fiji-vm3 - To do this copy fiji-vm1's public key (only) to the shared directory:
cp /home/fiji/.ssh/id_rsa.pub /mnt/shared/fiji-vm1.pub
- On the rest of the nodes append the contents of fiji-vm1's public key to
authorized_keys
with the command:cat /mnt/shared/fiji-vm1.pub >> /home/fiji/.ssh/authorized_keys
Now you should be able to SSH from fiji-vm1 to other nodes without requiring the password (try ssh fiji-vm2
on fiji-vm1)
To fully mimic the standard HPC installation, we need to install an OpenPBS, a job scheduling and workload management software for high-performance computing (HPC) environments.
- Install the dependencies of OpenPBS.
Specifically, enter the following lines:
and then enter the following lines:
sudo apt install gcc make libtool libhwloc-dev libx11-dev \ libxt-dev libedit-dev libical-dev ncurses-dev perl \ postgresql-server-dev-all postgresql-contrib python3-dev tcl-dev tclsh tk-dev swig \ libexpat-dev libssl-dev libxext-dev libxft-dev autoconf \ automake g++
sudo apt install expat libedit2 postgresql python3 postgresql-contrib sendmail-bin \ sudo tcl tk libical3 postgresql-server-dev-all
- Download the OpenPBS from https://www.openpbs.org/Download.aspx#download
- We need to install OpenPBS, on all nodes do the following:
unzip openpbs_20.0.1.ubuntu_1804.zip
cd openpbs_20.0.1.ubuntu_1804
- On all the compute nodes (fiji-vm2, fiji-vm3 not fiji-vm1):
-
sudo apt --fix-broken install ./openpbs-execution_20.0.1-1_amd64.deb
(Install the execution with all dependencies) -
sudo nano /etc/pbs.conf
and changePBS_SERVER=CHANGE_THIS_TO_PBS_SERVER_HOSTNAME
toPBS_SERVER=fiji-vm1
-
sudo nano /var/spool/pbs/mom_priv/config
and change$clienthost CHANGE_THIS_TO_PBS_SERVER_HOSTNAME
to$clienthost fiji-vm1
-
- Only on the server node (fiji-vm1):
-
sudo apt --fix-broken install ./openpbs-server_20.0.1-1_amd64.deb
(Install the server with all dependencies) -
sudo apt --fix-broken install ./openpbs-devel_20.0.1-1_amd64.deb
(This is necessary to be able to make Open MPI work with OpenPBS when we manually build Open MPI as you will see in the next section)
-
- We need to configure OpenPBS only on the first node:
-
sudo su
(The following commands must be executed in the superuser) -
. /etc/profile.d/pbs.sh
(Sets the environment variables) -
qmgr -c "set server acl_roots+=fiji@*"
(Enable fiji user to submit and administer the jobs) qmgr -c "set server operators+=fiji@*"
-
qmgr -c "create node fiji-vm1"
(Add the first node) -
qmgr -c "create node fiji-vm2"
(Add the second node) -
qmgr -c "create node fiji-vm3"
(Add the third node) -
qmgr -c "create queue qexp queue_type=e,started=t,enabled=t"
(Create default queue named qexp) -
qmgr -c "set server default_queue=qexp"
(Set default queue) -
qmgr -c "set server flatuid=true"
(Set permission policy) -
qmgr -c "s s job_history_enable=1"
(Configure OpenPBS to maintain job history) -
exit
(Stop root interactive shell)
-
- Enable and start the OpenPBS services
- Open the file
pbs.conf
using the commandsudo nano /etc/pbs.conf
- Make sure that all four services that are listed bellow are enabled (equal to 1).
PBS_START_SERVER=1 PBS_START_SCHED=1 PBS_START_COMM=1 PBS_START_MOM=1
- Restart the services
sudo /etc/init.d/pbs restart
- Check that the services started successfully using the command
sudo /etc/init.d/pbs status
If everything is correct, the output should look like this example of a valid output (the numbers may differ):pbs_server is pid 1949 pbs_mom is pid 1546 pbs_sched is pid 1589 pbs_comm is 1482
- Open the file
At this point you should have configured the first node, which can be checked by pbsnodes -a
fiji@fiji-vm1:/etc/openmpi$ pbsnodes -a
fiji-vm1
Mom = fiji-vm1
Port = 15002
pbs_version = 20.0.1
ntype = PBS
state = free
pcpus = 1
resources_available.arch = linux
resources_available.host = fiji-vm1
resources_available.mem = 2035476kb
resources_available.ncpus = 1
resources_available.vnode = fiji-vm1
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Mon Aug 9 12:44:08 2021
last_used_time = Mon Aug 9 12:44:08 2021
You can submit a test job echo "sleep 60" | qsub
and check its state with qstat
fiji@fiji-vm1:/etc/openmpi$ echo "sleep 60" | qsub
4017.fiji-vm1
fiji@fiji-vm1:/etc/openmpi$ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
4017.fiji-vm1 STDIN fiji 0 Q qexp
Open the file sudo nano /etc/environment
to append the path :/opt/pbs/bin/
at the end of the string. This will make it possible for HPC Workflow Manager to use the OpenPBS commands using SSH.
In order to use Open MPI we need to compile it from sources as the Open MPI in Ubuntu packages are not compiled with OpenPBS suport.
- Install the prerequisites (C and Java compiler)
sudo apt install build-essential
-
sudo apt install openjdk-8-jdk openjdk-8-jre
(we need to use Java 8 and not a default 11 due to an older version of Java used in Fiji) -
sudo apt install libssl-dev libz-dev
(dependencies for correct build with PBS support)
- Download Open MPI from www.open-mpi.org and extract it
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.1.tar.gz
tar xvfz openmpi-4.1.1.tar.gz
- Configure the Open MPI to install to shared directory, enable OpenPBS and Java bindings.
cd openmpi-4.1.1
./configure --prefix=/mnt/shared/openmpi/ --with-tm=/opt/pbs
- Compile and install the binaries and register the libraries
sudo make install
sudo ldconfig
- On the first node fiji-vm1 add all nodes to the default Open MPI hostfile
/etc/openmpi/openmpi-default-hostfile
fiji-vm1 slots=1 fiji-vm2 slots=1 fiji-vm3 slots=1
If all is set correctly, you should be able to use Open MPI and run jobs. To test it, run from the first node fiji-vm1 mpirun -np 3 -hostfile /etc/openmpi/openmpi-default-hostfile hostname
. Parameter -np defines how many processes you want to run, -hostfile defines the list of working nodes and hostname is a Unix utility that will printout the hostname of the node on which is executed.
You should get the following result. If you do not want to run jobs on the first node remove the fiji-vm1 slots=1
from the hostfile.
fiji@fiji-vm1:~$ mpirun -np 3 -hostfile /etc/openmpi/openmpi-default-hostfile hostname
fiji-vm3
fiji-vm2
fiji-vm1
We need to set the environment variables on all compute nodes that use Open MPI when we need to use it. To do this we will install Environment Modules and create an Open MPI module that will set the correct paths.
- Install the Ubuntu Environment Module package
sudo apt-get install -y environment-modules
- Create a directory for the Open MPI module
sudo mkdir /usr/share/modules/modulefiles/openmpi
- Create the file that describes the Open MPI module
sudo nano /usr/share/modules/modulefiles/openmpi/4.1.1-GCC-7.5.0-3
- Enter the following in the module file (named
4.1.1-GCC-7.5.0-3
for the version of Open MPI and GCC used to compile it):#%Module 1.0 # # Open MPI module for use with 'environment-modules' package: # conflict mpi prepend-path PATH /mnt/shared/openmpi/bin prepend-path LD_LIBRARY_PATH /mnt/shared/openmpi/lib
Both Parallel Macro and OpenMPI Ops use the Java Native Access (JNA) to use Open MPI. To make use of it you need to download and install two packages (on all nodes):
- Download the first package
wget http://archive.ubuntu.com/ubuntu/pool/main/libf/libffi/libffi8ubuntu1_3.4~20200819gead65ca871-0ubuntu5_amd64.deb
- Install the first package
-
sudo dpkg -i libffi8ubuntu1_3.4~20200819gead65ca871-0ubuntu5_amd64.deb
.
-
- Download the second package
wget https://answers.launchpad.net/ubuntu/+source/libjna-java/5.5.0-1.1/+build/19871301/+files/libjna-jni_5.5.0-1.1_amd64.deb
- Install the second package
sudo dpkg -i libjna-jni_5.5.0-1.1_amd64.deb
- Install the maven in order to be able to build Parallel Macro and OpenMPI Ops in the following steps (only on the login node fiji-vm1):
sudo apt install maven git
In order to have exactly the same installation of Fiji on all nodes, we will install it to the shared directory.
- Download the Fiji from https://imagej.net/software/fiji/downloads and extract it to a shared directory
cd /mnt/shared
wget https://downloads.imagej.net/fiji/latest/fiji-linux64.zip
unzip fiji-linux64.zip
- Run and update the Fiji
- Install the plugins from the
HPC-ParallelTools
update site
git clone https://github.com/fiji-hpc/parallel-macro.git
cd parallel-macro/
bash build.sh /mnt/shared/Fiji.app/
git clone https://github.com/fiji-hpc/scijava-parallel-mpi.git
cd scijava-parallel-mpi/
bash build.sh /mnt/shared/Fiji.app/
From command line of fiji-vm1 we can create a job to run Fiji on two nodes and execute simple parallel macro with just printout of node IDs - echo /usr/local/bin/mpirun -np 2 -hostfile /etc/openmpi/openmpi-default-hostfile /mnt/shared/Fiji.app/ImageJ-linux64 -Djava.library.path=/usr/local/lib -- --headless --console -macro /mnt/shared/work/2/parallelMacroWrappedScript.ijm | qsub 4018.fiji-vm1
.
The results will be produced to text files .o<TID>
.e<TID>
in the current working directory
fiji@fiji-vm1:~$ echo /usr/local/bin/mpirun -np 2 -hostfile /etc/openmpi/openmpi-default-hostfile /mnt/shared/Fiji.app/ImageJ-linux64 -Djava.library.path=/usr/local/lib -- --headless --console -macro /mnt/shared/work/2/parallelMacroWrappedScript.ijm | qsub
4018.fiji-vm1
fiji@fiji-vm1:~$ cat STDIN.o4018
Hello from node 0
Hello from node 1
fiji@fiji-vm1:~$ cat STDIN.e4018
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
Now you should be able to access fiji-vm1 within the Fiji UI using the SSH.
Short Guide Worksheets
-
Manually install cluster-side tools
- Note: The cluster-side tools are technically the Parallel Macro and OpenMPI Ops
-
Download and use your own cluster
- Note: A small homemade cluster for testing, or when you cannot access a big HPC
-
Building from scratch your own cluster and configuring it
- Note: You will learn and understand everything that's behind the scenes
- Additional Useful Information