-
Notifications
You must be signed in to change notification settings - Fork 18
How To Use Compute Canada Clusters
Simply put, a Computer Cluster is an agglomeration of computing resources that are easily accessible and shared fairly among its users.
A cluster is composed of 4 main parts:
- The Login Nodes allow you to connect to the cluster, setup the software you'll need and launch experiments(jobs) on the Compute Nodes.
- The Compute Nodes does the heavy lifting as they contain the actual computing resources. They replicate your environment from the login node and run the jobs.
- The Scheduler is an intermediate between you and the Compute Nodes and is in charge of enforcing the fair share of the resources. It does so by first determining which compute node has the resources you requested, then compute your priority using various factor and then put you at the right spot in the waiting queue.
- The distributed file system accessible by all the nodes.
Compute Canada is a government funded organisation that build, manage and maintain clusters for all Canadian Scientist in academia. It is composed of 4 child organisations (regional partners) that manage the clusters locally in each population basin.
- ACENET : (New Brunswick, Nova Scotia, PEI, Newfoundland)
- Calcul Québec : Québec
- Compute Ontario(Previously SHARKNET, HPCVL & SciNet) : Ontario
- WestGrid : (British Columbia, Alberta, Saskatchewan, Manitoba)
To access the clusters you have to first create an account at https://ccdb.computecanada.ca. Use a password with at least 8 characters, mixed case letters, digits and special characters. Later you will be asked to create another password with those rules, and it’s really convenient that the two password are the same.
After creating your account, you have to apply for a “role” at https://ccdb.computecanada.ca/me/add_role. Which means telling Compute Canada to what professor/supervisor (called sponsor here) you are affiliated. This will allow them to know which cluster you can have access to, and track your usage.
You will need to wait for your sponsor to accept your request before going to the next step.
After receiving confirmation that your sponsor accepted your request, you’ll need to apply for a consortium account at https://ccdb.computecanada.ca/me/facilities. This implies creating a second account with the regional partner (eg: Calcul Québec) that manage the cluster you want access to.
The GPU clusters for Calcul Québec are Guillimin, Hadès and Helios. If you need CPU computing power I recommend applying for Mammouth. Ask your sponsor if they have a special allocation on any of those. You can always edit those choices later here. For more details on what hardware they each offer, visit those pages. GPUs & CPUs
Note: The password you choose here and the given username will be the one used to log in those clusters.
To log into the clusters, you simply have to ssh to the right entry point of the wanted cluster. Those entry points are called interactive or login nodes, they are meant to set up the dependencies you need and launch experiments on the compute nodes.
Do not use the interactive node to run full experiments, as this will get you banned. You can use them for quick tests but not to run full experiments.
You will usually get the Url for a given cluster trough an email from the regional partner but here are some of them.
guillimin.hpc.mcgill.ca
helios.calculquebec.ca
hades.calculquebec.ca
Once you are logged in, it's time to setup your environment. The first thing to do is to ask your supervisor or the person in charge of your group if you have a prepared software stack to load. This will save you the trouble of installing everything yourself and save a lot of space group.
All cluster will give you access to a library of software that you can access on demand. The standard way to access that software is with module
. Be careful if you have a group stack, what you load here might conflict with it.
$ module list
Give you the list of what is loaded for your user. (ie: The software you can use at the moment.)
$ module avail
Give a full list of what is available.
$ module avail <module_name>
List all module containing the name.
$ module spider <module_name>
Detailed search that support regex.
$ module load <module_name>
Load a given module for the current session. Won’t be reloaded at next login.
To load this module at login time, add that command to your ~/.bashrc.
$ module unload <modul_name>
Unload a given module for the current session
Finally, always read the module --help
before using it, each cluster will have their own small differences.
If you need extra software that is not available trough module, the standard way to install it locally on Linux is in ~/.local
.
You will probably need to add the following exports to your ~/.bashrc.
export PATH=~/.local/bin/:$PATH
export CPATH=~/.local/include/:$CPATH
export C_INCLUDE_PATH=~/.local/include/:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=~/.local/include/:$CPLUS_INCLUDE_PATH
export LD_LIBRARY_PATH=~/.local/lib64/:~/.local/lib/:$LD_LIBRARY_PATH
export LIBRARY_PATH=~/.local/lib64/:~/.local/lib64/:$LIBRARY_PATH
export PYTHONPATH=~/.local/lib/:~/.local/lib/python2.7/site-packages/:$PYTHONPATH
If you have a group stack, this might already be there. Always talk to the person managing your group before installing software.
If you are using python as I do, pip is very useful to install packages locally.
$ pip list
List all installed packages.
$ pip search <package_name>
List all available packages in the PyPI repository containing the name you specified.
$ pip install --user <package_name>
Will install the python package locally in ~/.local
.
$ pip install --user --upgrade <package_name>
Will upgrade the python package locally in ~/.local
. Even if it was a pre-installed package in the system.
$ pip install --user git+<git_repo_url>
Will install the development version of that package locally in ~/.local
. This works only if the repo is a proper python package with a setup.py
.
$ git clone <git_repo_url>
$ pip install --user -e <cloned_folder>
This will install your cloned repo locally in editable mode which allows you to have your development version available in your system without playing with pythonpaths. This might not work with old version of canopy.