Skip to content

Building Your Own Cluster and Configuring It

Dimitrios Stefanos Velissariou edited this page Jun 28, 2022 · 17 revisions

This guide will show you how to setup three nodes (e.g. Virtual Machines for testing purposes) running Ubuntu 18.04 and configure them for Open MPI, OpenPBS and installation of Fiji with parallel macro and OpenMPI Ops support.

To mimic the HPC installation, we will need to

ℹ️ We also provide some additional useful information to help you reduce the time needed to set up the virtual cluster.

Basic Installation and Network Setup

  • Install Ubuntu 18.04. We will use user fiji as a default user.
  • Update the system
    • sudo apt-get update
    • sudo apt-get upgrade
  • Configure a local network for all three nodes
    • For the network we will use private range 192.168.1.xxx/24, where the first node will have the hostname fiji-vm1 and IP address 192.168.1.1, fiji-vm2 with 192.168.1.2, etc.
    • If you are using VirtualBox, leave the first network adapter in NAT to have the access to Internet and add the second (Intenal Network) interface to the private network shared among the virtual hosts.

To provide correct node names mapping, add the following lines to /etc/hosts

192.168.1.1   fiji-vm1
192.168.1.2   fiji-vm2
192.168.1.3   fiji-vm3

Setting NFS Shared Storage

For sharing the application and data files, we will use NFS (Network File System)

  • Install the NFS server on the first node fiji-vm1

    • sudo apt install nfs-server

    • Create a shared directory sudo mkdir /mnt/shared and set proper owner sudo chown fiji.fiji /mnt/shared.

    • We can also create a shared working directory there with mkdir /mnt/shared/work

    • In /etc/exports add the following line giving the access to the second and third node to this shared directory

      /mnt/shared  fiji-vm2(rw,sync)  fiji-vm3(rw,sync)
      
  • Install the NFS client on remaining nodes fiji-vm2 and fiji-vm3

    • sudo apt install nfs-common

    • Create automatic mount for the shared directory on fiji-vm2 and fiji-vm3. In /etc/fstab add the following line

      fiji-vm1:/mnt/shared	/mnt/shared	nfs	defaults
      

After the restart, you should see mounted shared /mnt/shared on all nodes.

Setting remote access via SSH

To access the nodes remotely, install the SSH server and distribute the fiji-vm1 key to fiji-vm2 and fiji-vm3.

  • Install the SSH server on all nodes
    • sudo apt install openssh-server
    • On fiji-vm1 generate the keys using ssh-keygen
    • Append the content of /home/fiji/.ssh/id_rsa.pub on fiji-vm1 to /home/fiji/.ssh/authorized_keys on fiji-vm2 and fiji-vm3
    • To do this copy fiji-vm1's public key (only) to the shared directory: cp /home/fiji/.ssh/id_rsa.pub /mnt/shared/fiji-vm1.pub
    • On the rest of the nodes append the contents of fiji-vm1's public key to authorized_keys with the command: cat /mnt/shared/fiji-vm1.pub >> /home/fiji/.ssh/authorized_keys

Now you should be able to SSH from fiji-vm1 to other nodes without requiring the password (try ssh fiji-vm2 on fiji-vm1)

Installation and Configuration of OpenPBS

To fully mimic the standard HPC installation, we need to install an OpenPBS, a job scheduling and workload management software for high-performance computing (HPC) environments.

  • Install the dependencies of OpenPBS. Specifically, enter the following lines:
    sudo apt install gcc make libtool libhwloc-dev libx11-dev \
        libxt-dev libedit-dev libical-dev ncurses-dev perl \
        postgresql-server-dev-all postgresql-contrib python3-dev tcl-dev tclsh tk-dev swig \
        libexpat-dev libssl-dev libxext-dev libxft-dev autoconf \
        automake g++
    
    and then enter the following lines:
    sudo apt install expat libedit2 postgresql python3 postgresql-contrib sendmail-bin \
      sudo tcl tk libical3 postgresql-server-dev-all
    
  • Download the OpenPBS from https://www.openpbs.org/Download.aspx#download
  • We need to install OpenPBS, on all nodes do the following:
    • unzip openpbs_20.0.1.ubuntu_1804.zip
    • cd openpbs_20.0.1.ubuntu_1804
  • On all the compute nodes (fiji-vm2, fiji-vm3 not fiji-vm1):
    • sudo apt --fix-broken install ./openpbs-execution_20.0.1-1_amd64.deb (Install the execution with all dependencies)
    • sudo nano /etc/pbs.conf and change PBS_SERVER=CHANGE_THIS_TO_PBS_SERVER_HOSTNAME to PBS_SERVER=fiji-vm1
    • sudo nano /var/spool/pbs/mom_priv/config and change $clienthost CHANGE_THIS_TO_PBS_SERVER_HOSTNAME to $clienthost fiji-vm1
  • Only on the server node (fiji-vm1):
    • sudo apt --fix-broken install ./openpbs-server_20.0.1-1_amd64.deb (Install the server with all dependencies)
    • sudo apt --fix-broken install ./openpbs-devel_20.0.1-1_amd64.deb (This is necessary to be able to make Open MPI work with OpenPBS when we manually build Open MPI as you will see in the next section)
  • We need to configure OpenPBS only on the first node:
    • sudo su (The following commands must be executed in the superuser)
    • . /etc/profile.d/pbs.sh (Sets the environment variables)
    • qmgr -c "set server acl_roots+=fiji@*" (Enable fiji user to submit and administer the jobs)
    • qmgr -c "set server operators+=fiji@*"
    • qmgr -c "create node fiji-vm1" (Add the first node)
    • qmgr -c "create node fiji-vm2" (Add the second node)
    • qmgr -c "create node fiji-vm3" (Add the third node)
    • qmgr -c "create queue qexp queue_type=e,started=t,enabled=t" (Create default queue named qexp)
    • qmgr -c "set server default_queue=qexp" (Set default queue)
    • qmgr -c "set server flatuid=true" (Set permission policy)
    • qmgr -c "s s job_history_enable=1" (Configure OpenPBS to maintain job history)
    • exit (Stop root interactive shell)
  • Enable and start the OpenPBS services
    • Open the file pbs.conf using the command sudo nano /etc/pbs.conf
    • Make sure that all four services that are listed bellow are enabled (equal to 1).
      PBS_START_SERVER=1
      PBS_START_SCHED=1
      PBS_START_COMM=1
      PBS_START_MOM=1
      
    • Restart the services sudo /etc/init.d/pbs restart
    • Check that the services started successfully using the command sudo /etc/init.d/pbs status If everything is correct, the output should look like this example of a valid output (the numbers may differ):
      pbs_server is pid 1949
      pbs_mom is pid 1546
      pbs_sched is pid 1589
      pbs_comm is 1482
      

At this point you should have configured the first node, which can be checked by pbsnodes -a

fiji@fiji-vm1:/etc/openmpi$ pbsnodes -a
fiji-vm1
   Mom = fiji-vm1
   Port = 15002
   pbs_version = 20.0.1
   ntype = PBS
   state = free
   pcpus = 1
   resources_available.arch = linux
   resources_available.host = fiji-vm1
   resources_available.mem = 2035476kb
   resources_available.ncpus = 1
   resources_available.vnode = fiji-vm1
   resources_assigned.accelerator_memory = 0kb
   resources_assigned.hbmem = 0kb
   resources_assigned.mem = 0kb
   resources_assigned.naccelerators = 0
   resources_assigned.ncpus = 0
   resources_assigned.vmem = 0kb
   resv_enable = True
   sharing = default_shared
   last_state_change_time = Mon Aug  9 12:44:08 2021
   last_used_time = Mon Aug  9 12:44:08 2021

You can submit a test job echo "sleep 60" | qsub and check its state with qstat

fiji@fiji-vm1:/etc/openmpi$ echo "sleep 60" | qsub
4017.fiji-vm1
fiji@fiji-vm1:/etc/openmpi$ qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
4017.fiji-vm1     STDIN            fiji                     0 Q qexp 

Open the file sudo nano /etc/environment to append the path :/opt/pbs/bin/ at the end of the string. This will make it possible for HPC Workflow Manager to use the OpenPBS commands using SSH.

Compiling Open MPI with OpenPBS support

In order to use Open MPI we need to compile it from sources as the Open MPI in Ubuntu packages are not compiled with OpenPBS suport.

  • Install the prerequisites (C and Java compiler)
    • sudo apt install build-essential
    • sudo apt install openjdk-8-jdk openjdk-8-jre (we need to use Java 8 and not a default 11 due to an older version of Java used in Fiji)
    • sudo apt install libssl-dev libz-dev (dependencies for correct build with PBS support)
  • Download Open MPI from www.open-mpi.org and extract it
    • wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.1.tar.gz
    • tar xvfz openmpi-4.1.1.tar.gz
  • Configure the Open MPI to install to shared directory, enable OpenPBS and Java bindings.
    • cd openmpi-4.1.1
    • ./configure --prefix=/mnt/shared/openmpi/ --with-tm=/opt/pbs
  • Compile and install the binaries and register the libraries
    • sudo make install
    • sudo ldconfig
  • On the first node fiji-vm1 add all nodes to the default Open MPI hostfile /etc/openmpi/openmpi-default-hostfile
    fiji-vm1 slots=1
    fiji-vm2 slots=1
    fiji-vm3 slots=1
    

If all is set correctly, you should be able to use Open MPI and run jobs. To test it, run from the first node fiji-vm1 mpirun -np 3 -hostfile /etc/openmpi/openmpi-default-hostfile hostname. Parameter -np defines how many processes you want to run, -hostfile defines the list of working nodes and hostname is a Unix utility that will printout the hostname of the node on which is executed. You should get the following result. If you do not want to run jobs on the first node remove the fiji-vm1 slots=1 from the hostfile.

fiji@fiji-vm1:~$ mpirun -np 3 -hostfile /etc/openmpi/openmpi-default-hostfile hostname
fiji-vm3
fiji-vm2
fiji-vm1

Installation of Environment Module

We need to set the environment variables on all compute nodes that use Open MPI when we need to use it. To do this we will install Environment Modules and create an Open MPI module that will set the correct paths.

  • Install the Ubuntu Environment Module package sudo apt-get install -y environment-modules
  • Create a directory for the Open MPI module sudo mkdir /usr/share/modules/modulefiles/openmpi
  • Create the file that describes the Open MPI module sudo nano /usr/share/modules/modulefiles/openmpi/4.1.1-GCC-7.5.0-3
  • Enter the following in the module file (named 4.1.1-GCC-7.5.0-3 for the version of Open MPI and GCC used to compile it):
    #%Module 1.0
    #
    #  Open MPI module for use with 'environment-modules' package:
    #
    conflict                mpi
    prepend-path            PATH            /mnt/shared/openmpi/bin
    prepend-path            LD_LIBRARY_PATH /mnt/shared/openmpi/lib

Fiji Installation with Parallel Macro and OpenMPI Ops Support

JNA Support

Both Parallel Macro and OpenMPI Ops use the Java Native Access (JNA) to use Open MPI. To make use of it you need to download and install two packages (on all nodes):

  • Download the first package
    • wget http://archive.ubuntu.com/ubuntu/pool/main/libf/libffi/libffi8ubuntu1_3.4~20200819gead65ca871-0ubuntu5_amd64.deb
  • Install the first package
    • sudo dpkg -i libffi8ubuntu1_3.4~20200819gead65ca871-0ubuntu5_amd64.deb.
  • Download the second package
    • wget https://answers.launchpad.net/ubuntu/+source/libjna-java/5.5.0-1.1/+build/19871301/+files/libjna-jni_5.5.0-1.1_amd64.deb
  • Install the second package
    • sudo dpkg -i libjna-jni_5.5.0-1.1_amd64.deb

Install Maven

  • Install the maven in order to be able to build Parallel Macro and OpenMPI Ops in the following steps (only on the login node fiji-vm1):
    • sudo apt install maven git

Shared Fiji Installation

In order to have exactly the same installation of Fiji on all nodes, we will install it to the shared directory.

  • Download the Fiji from https://imagej.net/software/fiji/downloads and extract it to a shared directory
    • cd /mnt/shared
    • wget https://downloads.imagej.net/fiji/latest/fiji-linux64.zip
    • unzip fiji-linux64.zip
  • Run and update the Fiji
  • Install the plugins from the HPC-ParallelTools update site

Download and Build Parallel Macro

  • git clone https://github.com/fiji-hpc/parallel-macro.git
  • cd parallel-macro/
  • bash build.sh /mnt/shared/Fiji.app/

Download and Build OpenMPI Ops

  • git clone https://github.com/fiji-hpc/scijava-parallel-mpi.git
  • cd scijava-parallel-mpi/
  • bash build.sh /mnt/shared/Fiji.app/

Test Fiji with Parallel Macro in CLI

From command line of fiji-vm1 we can create a job to run Fiji on two nodes and execute simple parallel macro with just printout of node IDs - echo /usr/local/bin/mpirun -np 2 -hostfile /etc/openmpi/openmpi-default-hostfile /mnt/shared/Fiji.app/ImageJ-linux64 -Djava.library.path=/usr/local/lib -- --headless --console -macro /mnt/shared/work/2/parallelMacroWrappedScript.ijm | qsub 4018.fiji-vm1. The results will be produced to text files .o<TID> .e<TID> in the current working directory

 fiji@fiji-vm1:~$ echo /usr/local/bin/mpirun -np 2 -hostfile /etc/openmpi/openmpi-default-hostfile /mnt/shared/Fiji.app/ImageJ-linux64 -Djava.library.path=/usr/local/lib -- --headless --console -macro /mnt/shared/work/2/parallelMacroWrappedScript.ijm | qsub
 4018.fiji-vm1
 
 fiji@fiji-vm1:~$ cat STDIN.o4018
 Hello from node 0
 Hello from node 1

 fiji@fiji-vm1:~$ cat STDIN.e4018
 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
 Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
 Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release

Now you should be able to access fiji-vm1 within the Fiji UI using the SSH.

main hub:house: next 👉

Clone this wiki locally