|
1 | 1 | # Verify Intel® Gaudi® AI Accelerator Provisioning
|
2 | 2 |
|
| 3 | +## hl-smi |
| 4 | +System Management Interface Tool (hl-smi) utility tool obtains information and monitors data of the Intel Gaudi AI accelerators. |
| 5 | +`hl-smi` tool is packaged with the Gaudi base image. Run below command to deploy and execute the tool: |
| 6 | +``` |
| 7 | +$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/hl-smi_job.yaml |
| 8 | +``` |
| 9 | + |
| 10 | +Verify Output: |
| 11 | +``` |
| 12 | +$ oc get pods |
| 13 | +NAME READY STATUS RESTARTS AGE |
| 14 | +hl-smi-workload-2-f5qgs 0/1 Completed 0 27m |
| 15 | +``` |
| 16 | +``` |
| 17 | +$ oc logs hl-smi-workload-2-f5qgs |
| 18 | ++-----------------------------------------------------------------------------+ |
| 19 | +| HL-SMI Version: hl-1.17.1-fw-51.5.0 | |
| 20 | +| Driver Version: 1.17.1-78932ae | |
| 21 | +|-------------------------------+----------------------+----------------------+ |
| 22 | +| AIP Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | |
| 23 | +| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | AIP-Util Compute M. | |
| 24 | +|===============================+======================+======================| |
| 25 | +| 0 HL-225 N/A | 0000:4d:00.0 N/A | 0 | |
| 26 | +| N/A 31C N/A 95W / 600W | 768MiB / 98304MiB | 0% N/A | |
| 27 | +|-------------------------------+----------------------+----------------------+ |
| 28 | +| 1 HL-225 N/A | 0000:b4:00.0 N/A | 0 | |
| 29 | +| N/A 28C N/A 85W / 600W | 768MiB / 98304MiB | 0% N/A | |
| 30 | +|-------------------------------+----------------------+----------------------+ |
| 31 | +| Compute Processes: AIP Memory | |
| 32 | +| AIP PID Type Process name Usage | |
| 33 | +|=============================================================================| |
| 34 | +| 0 N/A N/A N/A N/A | |
| 35 | +| 1 N/A N/A N/A N/A | |
| 36 | ++=============================================================================+ |
| 37 | +``` |
| 38 | + |
3 | 39 | ## HCCL
|
4 | 40 | HCCL (Habana Collective Communication Library) demo is a program that demonstrates HCCL usage and supports communication via Gaudi based scale-out or Host NIC scale-out. Refer [HCCL Demo](https://github.com/HabanaAI/hccl_demo/tree/main?tab=readme-ov-file#hccl-demo) for more details.
|
5 | 41 |
|
6 | 42 | Build the workload container image:
|
7 | 43 | ```
|
8 | 44 | $ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/hccl_build.yaml
|
9 | 45 | ```
|
| 46 | +Create service account with required permissions: |
| 47 | +``` |
| 48 | +$ oc create sa hccl-demo-anyuid-sa -n hccl-demo |
| 49 | +$ oc adm policy add-scc-to-user anyuid -z hccl-demo-anyuid-sa -n hccl-demo |
| 50 | +``` |
10 | 51 | Deploy and execute the workload:
|
11 | 52 | ```
|
12 | 53 | $ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/hccl_job.yaml
|
|
0 commit comments