Skip to content

Commit 95ebfdc

Browse files
authored
Merge pull request #312 from chaitanya1731/gaudi_networking
tests: Added hl-smi Gaudi Test Case
2 parents 36223e4 + 11d02d8 commit 95ebfdc

File tree

2 files changed

+59
-0
lines changed

2 files changed

+59
-0
lines changed

tests/gaudi/l2/README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,53 @@
11
# Verify Intel® Gaudi® AI Accelerator Provisioning
22

3+
## hl-smi
4+
System Management Interface Tool (hl-smi) utility tool obtains information and monitors data of the Intel Gaudi AI accelerators.
5+
`hl-smi` tool is packaged with the Gaudi base image. Run below command to deploy and execute the tool:
6+
```
7+
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/hl-smi_job.yaml
8+
```
9+
10+
Verify Output:
11+
```
12+
$ oc get pods
13+
NAME READY STATUS RESTARTS AGE
14+
hl-smi-workload-2-f5qgs 0/1 Completed 0 27m
15+
```
16+
```
17+
$ oc logs hl-smi-workload-2-f5qgs
18+
+-----------------------------------------------------------------------------+
19+
| HL-SMI Version: hl-1.17.1-fw-51.5.0 |
20+
| Driver Version: 1.17.1-78932ae |
21+
|-------------------------------+----------------------+----------------------+
22+
| AIP Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
23+
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | AIP-Util Compute M. |
24+
|===============================+======================+======================|
25+
| 0 HL-225 N/A | 0000:4d:00.0 N/A | 0 |
26+
| N/A 31C N/A 95W / 600W | 768MiB / 98304MiB | 0% N/A |
27+
|-------------------------------+----------------------+----------------------+
28+
| 1 HL-225 N/A | 0000:b4:00.0 N/A | 0 |
29+
| N/A 28C N/A 85W / 600W | 768MiB / 98304MiB | 0% N/A |
30+
|-------------------------------+----------------------+----------------------+
31+
| Compute Processes: AIP Memory |
32+
| AIP PID Type Process name Usage |
33+
|=============================================================================|
34+
| 0 N/A N/A N/A N/A |
35+
| 1 N/A N/A N/A N/A |
36+
+=============================================================================+
37+
```
38+
339
## HCCL
440
HCCL (Habana Collective Communication Library) demo is a program that demonstrates HCCL usage and supports communication via Gaudi based scale-out or Host NIC scale-out. Refer [HCCL Demo](https://github.com/HabanaAI/hccl_demo/tree/main?tab=readme-ov-file#hccl-demo) for more details.
541

642
Build the workload container image:
743
```
844
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/hccl_build.yaml
945
```
46+
Create service account with required permissions:
47+
```
48+
$ oc create sa hccl-demo-anyuid-sa -n hccl-demo
49+
$ oc adm policy add-scc-to-user anyuid -z hccl-demo-anyuid-sa -n hccl-demo
50+
```
1051
Deploy and execute the workload:
1152
```
1253
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/hccl_job.yaml

tests/gaudi/l2/hl-smi_job.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
apiVersion: batch/v1
2+
kind: Job
3+
metadata:
4+
name: hl-smi-workload
5+
namespace: hl-smi-demo
6+
spec:
7+
template:
8+
metadata:
9+
spec:
10+
restartPolicy: Never
11+
containers:
12+
- name: hl-smi-workload
13+
image: vault.habana.ai/gaudi-docker/1.17.1/rhel9.4/habanalabs/pytorch-installer-2.3.1:1.17.1-40
14+
command: ["hl-smi"]
15+
resources:
16+
limits:
17+
habana.ai/gaudi: 8
18+
imagePullPolicy: IfNotPresent

0 commit comments

Comments
 (0)