Machine learning workshop as an introduction to the Parallel Works ACTIVATE user experience. ACTIVATE is a single control plane for cloud and on-premise high performance resources.
Please find the link to the Day 1 overview presentation here.
Please find the link to the Day 2 walkthrough presentation here.
The main activities of this workshop are to:
- Start a personal cloud cluster
- Start notebook session on cluster
- Download notebook from public repository to cluster
- Run notebook on cluster
- Copy files to different storage (bucket, workspace)
- Track cost in near real time
- Launch MPI job via script_submitter (optional)
- Stop JupyterLab session and the cluster
- Log into the platform by going to hpcmp-cloud.parallel.works.
- Change your password immediately after this session. Initial login can be complicated by:
- delayed/filtered password reset messages and
- cannot use PED for MFA in certain locations.
- On the
Home
page, go to theCompute
tile and click on theOn button
for your default cluster. - Cloud cluster startup takes ~2-5 minutes.
- Please explore - but do not change - the configuration with the
i
button. - You may experiment with other cluster configurations after the workshop.
- In particular, note that the cluster has the following parts:
- On the ACTIVATE Home page, click on the
JupyterLab
workflow tile - There is no need to change the fields on the workflow launch page - your one running cluster autopopulates.
- Default settings include using cached software on
/pw/apps
to minimize JupyterLab startup time and bypass the need to install TensorFlow. - You are welcome to explore the options after the workshop.
- Click on the
Execute
button. - You can stay on the workflow launch status page, but it's more interesting to:
- go to your Home page
- notice your session is starting up (
Sessions
tile)... - ...and the workflow is running (
Workflow Runs
tile) - A workflow is just a series of automated steps.
- An interactive session is a special type of workflow whose steps include the setup for sending graphics from the cloud cluster to your ACTIVATE workspace.
- Workflows can also be purely computational (i.e. running a simulation) or even a mix of non-graphical and graphical applications.
- Workflows are defined in an easy to use
.yaml
format; this is beyond the scope of the workshop but PW documentation has more information. - click on the run number of the JupyterLab workflow (i.e.
00001
) to view the workflow progress and logs - JupyterLab is ready when
Create session
has a green checkmark in the workflow viewer or there is a green light for the entry in theSessions
tile on ACTIVATEHome
.
- Access your JupyterLab session on the head node of the cluster by clicking on its session in the
Session
tile on ACTIVATEHome
. - Often, it's convenient to use the
Open in new tab
button to place the session in its own browser tab. - Use the JupyterLab launcher tab to start a terminal (you may need to scroll down)
- In the terminal in your JupyterLab session, please run
git clone https://github.com/parallelworks/ml-workshop
to place a copy of this repository on your cluster. - You should see
ml-workshop
in the file browser portion (left sidebar) of JupyterLab - You are also welcome to run simple Linux terminal commands like
hostname
,whoami
,sinfo
, andsqueue
to verify that you are on the head node of a SLURM cluster.
- Start the notebook by clicking on
ml-workshop
in the JupyterLab file browser and thencvae_example.ipynb
. - The notebook stores code, output, and error messages all in the same file.
- The error messages here are totally normal.
- Notebook cells can be run individually by selecting them and clicking on the
Play
icon (right pointing arrowhead). - Or, you can go to the top menu and select
Kernel > Restart Kernel and Run All
to engage all the cells. - While running the steps of the notebook:
- This small example of generative AI trains a neural network to recognize handwritten digits (0-9).
- A citation, summary of the job, and an example of extending this approach to a bigger science application is presented at the top of the notebook.
- The training and visualization steps will each take a few minutes
- While they run, if you have opened the JupyterLab session in its own tab, you can go back to the ACTIVATE
Home
page on your original browser tab to verify your session/workflow is still running. - If you haven't opened the JupyterLab session in its own tab, depending on your browser settings, your notebook can be interupted. If this happens, just open the notebook again and rerun it.
- You can monitor CPU/RAM/disk usage in near real time by selecting the
i
button on the line of your cluster in theCompute
tile. - Or, you can monitor resource usage in the terminal with
htop
, etc.
- There are several persistent storage options integrated with your ephemeral cloud cluster.
/pw/bbb
is a shared cloud bucket mounted to the cluster.- For simplicity, this bucket is shared among all workshop participants; you can overwrite each other's files here!
- The example snippet below will write your username to a file with the same name in the bucket. It should be overwriting-safe since all the usernames are different.
# Create a file
echo $USER > /pw/bbb/${USER}
# Check if the file is created
ls /pw/bbb/
- You can get short term credentials to the bucket and examples for use with standard CSP CLI tools by clicking on the
Buckets
tab on the left sidebar of your ACTIVATEHome
, selecting thebbb
bucket, and then clicking on theCredentials
button in the upper right corner. - The home directory of your cluster is also mounted into your persistent ACTIVATE workspace. You can view the files by clicking on the
Editor
tab on the left sidebar. TheEditor
tab also opens an integrated development environment (IDE) associated with your private workspace on ACTIVATE. - You can upload/download files from your ACTIVATE workspace IDE to and from your local computer as well as drag and drop files in the file browser between clusters (i.e. each cloud or on-premise cluster connected to your ACTIVATE account can mount to your IDE). This functionality supports files up to 8GB in size. For larger files, using CSP CLI tools (e.g.
aws s3 ...
,glcoud storage ...
, oraz storage ...
) or other data tools are recommended.
- Go back to the ACTIVATE Home page.
- Click on the
$ Cost
menu item on the left sidebar. - You may need to set the group to
ml-workshop
in the ribbon/filter bar across the top of the cost dashboard. - To see the spend assocaited with your account, click on
Filter Options
, selectUser
, and select your username from the list. - With many users from the same group on the
Cost
page at the same time, it may be necessary to refresh the page when adjusting the filters (circle arrow button on browser).
- OpenMPI is already installed at
/pw/apps/ompi
. - If you run
run_mpitest.sh
in this repository, it will:- set up the system paths to access OpenMPI,
- compile the hello world MPI source code provided here, and
- run the code over 4 CPUs distributed over two worker nodes.
- You can check for the status of this multiple node job with
sinfo
andsqueue
in another terminal.
- You can also copy and paste the contents of
run_mpitest.sh
into thescript_submitter
workflow's launch page to run the script on the cluster as if it were a formal workflow.
- When you are finished with the workshop, please shut down your ephemeral cloud cluster to avoid unnecessary costs.
- On your ACTIVATE
Home
, click on the "do not enter" symbol next to theJupyterLab
entry in theWorkflow runs
tile to stop the workflow. This action will also automatically cancel the associatedJupyterLab
session in theSessions
tile. - If you have started other workflows and you don't need to continue them, please cancel them in the same way as the
JupyterLab
workflow. - Technically, you don't need to cancel workflows and sessions before turning off a cluster, but it's generally best practice because some things get cleaned up in the background.
- On your ACTIVATE
Home
, click the On/Off button next to your cluster so that the button goes from green to gray.