-
Notifications
You must be signed in to change notification settings - Fork 109
Generating MFC Images and Testing Them on OSPool #935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the existing CI workflows with a single pipeline that builds, deploys, and tests four Singularity images (CPU, CPU_Benchmark, GPU, GPU_Benchmark) on OSPOOL.
- Removes legacy workflows for testing, linting, coverage, formatting, docs, benches, and cleanliness checks.
- Introduces
.github/workflows/container-image.yml
to build images via Apptainer, SCP them to the remote host, and then execute tests inside each image. - Adds four Singularity definition files under
.github/workflows/images/
to specify the CPU/GPU and benchmarking images.
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
.github/workflows/container-image.yml | New workflow to build, store, and test Singularity images |
.github/workflows/images/Singularity.cpu | Definition for the CPU image |
.github/workflows/images/Singularity.cpu_bench | Definition for the CPU benchmark image |
.github/workflows/images/Singularity.gpu | Definition for the GPU image |
.github/workflows/images/Singularity.gpu_bench | Definition for the GPU benchmark image |
.github/workflows/test.yml | Removed legacy test suite workflow |
.github/workflows/spelling.yml | Removed spelling check workflow |
.github/workflows/lint-toolchain.yml | Removed toolchain lint workflow |
.github/workflows/lint-source.yml | Removed source lint workflow |
.github/workflows/formatting.yml | Removed code formatting workflow |
.github/workflows/docs.yml | Removed documentation build & publish workflow |
.github/workflows/coverage.yml | Removed coverage check workflow |
.github/workflows/cleanliness.yml | Removed cleanliness check workflow |
.github/workflows/line-count.yml | Removed lines-of-code diff workflow |
.github/workflows/bench.yml | Removed benchmark comparison workflow |
Comments suppressed due to low confidence (4)
.github/workflows/container-image.yml:56
- There’s an extraneous double-quote on this line, which will cause a YAML syntax error. Please remove it to ensure the
run
block is parsed correctly.
"
.github/workflows/container-image.yml:51
- This
ssh ... ""
command is a no-op and is likely unintended. Either remove it or replace it with the actual remote command you want to run.
ssh ${{secrets.SSH_USER}} ""
.github/workflows/container-image.yml:28
- You remove each
.sif
immediately after SCP in the Build step, so the Test Images step won’t find any local.sif
files. Consider delaying the cleanup or reordering the steps so tests still run on the generated images.
(cd pr/.github/workflows/images && sudo apptainer build mfc_cpu.sif Singularity.cpu)
PR Code Suggestions ✨Explore these optional code suggestions:
|
I removed all workflow files and will restore them once done, as they are not implicated with #654 whatsoever so far. |
wonderful thank you |
As of right now, I relied solely on Requested allocated resources are quite excessive for now and will be optimized later on to not get stuck in the queue forever. |
Grab the new workflow files from master and you can start doing CI again. You may need to merge in any changes you made. lmk if you have questions. |
Status Update: I faced a hurdle with ssh connectivity whether using SSH Keys (public/private) or Credentials (
Edit: I am going to inquire on how to ensure each job instance occurs on a distinct cluster i.e. 5-10 instances of a single job would run on 5-10 unique clusters increasing failure potentials. |
User description
Description
Concerning (#654),
Generating four images CPU, CPU_Benchmark, GPU, and GPU_Benchmark. All MFC builds occur on a GitHub runner, while testing and storing latest images take place on OSPOOL. They are retrievable on the CI itself as the images are pre-built MFC with pre-installed packages that can be accessed with simple commands.
Debugging info,
To locally generate images,
apptainer build mfc_cpu.sif Singularity.cpu
To start shell instance,
apptainer shell --fakeroot --writable-tmpfs mfc_cpu.sif
To execute directly specific commands,
apptainer exec --fakeroot --writable-tmpfs mfc_cpu.sif /bin/bash -c 'cd /opt/MFC && ./mfc.sh test -a'
To-dos,
Note to Self: current secrets are hosted in the fork, and prior to merge new dedicated ones should be added to the base repo. To do so, request access point under "GATech_Bryngelson" project, then upload public SSH key to https://registry.cilogon.org/. Later on, update secrets which include private SSH key and user@host.
Ref's
NVIDIA Container
PR Type
Other
Description
Remove existing CI workflows and testing infrastructure
Add Singularity container image building workflow
Create four container definitions for CPU/GPU variants
Implement automated image building and testing on OSPool
Changes diagram
Changes walkthrough 📝
17 files
Remove Frontier build script
Remove Frontier job submission script
Remove Frontier test script
Remove Phoenix benchmark script
Remove Phoenix benchmark submission script
Remove Phoenix job submission script
Remove Phoenix test script
Remove benchmark workflow
Remove code cleanliness workflow
Remove coverage check workflow
Remove documentation workflow
Remove formatting check workflow
Remove line count workflow
Remove source linting workflow
Remove toolchain linting workflow
Remove spell check workflow
Remove main test suite workflow
5 files
Add Singularity image building workflow
Add CPU container definition
Add CPU benchmark container definition
Add GPU container definition
Add GPU benchmark container definition