You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Intel Gaudi Base Operator](https://catalog.redhat.com/software/container-stacks/detail/6683b2cce45daa25e36bddcb) is used to provision Intel Gaudi Accelerator with OpenShift. The steps and yaml files mentioned in this document to provision the Gaudi accelerator are based on [Intel Gaudi Base Operator for OpenShift](https://docs.habana.ai/en/latest/Orchestration/Intel_Gaudi_Base_Operator/index.html).
4
+
[Intel Gaudi AI Accelerator Operator](https://catalog.redhat.com/software/container-stacks/detail/6683b2cce45daa25e36bddcb) is used to provision Intel Gaudi Accelerator with OpenShift. The steps and yaml files mentioned in this document to provision the Gaudi accelerator are based on [Intel Gaudi AI Accelerator Operator for OpenShift](https://docs.habana.ai/en/latest/Orchestration/Intel_Gaudi_Base_Operator/index.html).
5
5
6
6
If you are familiar with the steps here to manually provision the accelerator, the Red Hat certified Operator and Ansible based [One-Click](/one_click/README.md#reference-playbook-–-habana-gaudi-provisioning) solution can be used as a reference to provision the accelerator automatically.
7
7
8
8
## Prerequisities
9
9
- To Provision RHOCP cluster, follow steps [here](/README.md#provisioning-rhocp-cluster).
10
-
- To Install NFD Operator, follow steps [here](/nfd/README.md#install-nfd-operator).
11
-
- To Install KMM Operator, follow steps [here](/kmmo/README.md#install-kmm-operator).
12
10
13
-
## Update Kernel Firmware Search Path with MCO
14
-
**Note:** This step will reboot the nodes, it is recommended to do this in the first step.
15
-
16
-
The default kernel firmware search path `/lib/firmware` in RHCOS is not writable. Command below can be used to add path `/var/lib/fimware` into the firmware search path list.
## Creating Intel Gaudi Base Operator DeviceConfig Instance
34
+
## Creating Intel Gaudi AI Accelerator Operator ClusterPolicy Instance
58
35
To create a Habana Gaudi device plugin CR, follow the steps below.
59
36
60
37
### Create CR via web console
61
38
1. Go to **Operator** -> **Installed Operators**.
62
-
2. Open **Intel Gaudi Base Operator**.
63
-
3. Navigate to tab **Device Config**.
64
-
4. Click **Create DeviceConfig** -> set correct parameters -> Click **Create**. To set correct parameters please refer [Using RedHat OpenShift Console](https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Intel_Gaudi_Base_Operator/Deploying_Intel_Gaudi_Base_Operator.html?highlight=openshift#id2).
39
+
2. Open **Intel Gaudi AI Accelerator Operator**.
40
+
3. Navigate to tab **Cluster Policy**.
41
+
4. Click **Create ClusterPolicy** -> set correct parameters -> Click **Create**. To set correct parameters please refer [Using RedHat OpenShift Container Platform Console](https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Kubernetes_Installation/Kubernetes_Operator.html#id1).
65
42
66
43
### Verify via web console
67
-
1. Verify CR by checking the status of **Workloads** -> **DaemonSet** -> **habana-ai-module-device-plugin-xxxxx**.
68
-
2. Now `DeviceConfig` is created.
44
+
1. Verify CR by checking the status of **Workloads** -> **DaemonSet** -> **habana-ai-device-plugin-ds**, **habana-ai-driver-rhel-9-4-xxxxx**, **habana-ai-feature-discovery-ds**, **habana-ai-metric-exporter-ds**, **habana-ai-runtime-ds**.
Message: All resources have been successfully reconciled
99
78
Reason: Reconciled
100
79
Status: True
101
80
```
102
81
## Verify Gaudi Provisioning
103
-
After the `DeviceConfig` instance CR is created, it will take some time for the operator to download the Gaudi OOT driver source code and build it on-premise with the help of the KMM operator. The OOT driver module binaries will be loaded into the RHCOS kernel on each node with Gaudi cards labelled by NFD. Then, the Gaudi device plugin can advertise the Gaudi resources listed in the table for the pods on OpenShit to use. Run the command below to check the availability of Gaudi resources:
82
+
After the `ClusterPolicy` instance CR is created, it will take some time for the operator to download the Gaudi OOT driver source code and build it on-premise with the help of the KMM operator. The OOT driver module binaries will be loaded into the RHCOS kernel on each node with Gaudi cards labelled by feature discovery. Then, the Gaudi device plugin can advertise the Gaudi resources listed in the table for the pods on OpenShit to use. Run the command below to check the availability of Gaudi resources:
104
83
```
105
84
oc describe node | grep habana.ai/gaudi
106
85
@@ -119,4 +98,4 @@ The resources provided are the user interface for customers to claim and consume
119
98
| Habana Gaudi |`habana.ai/gaudi`| Number of Habana Gaudi Card resources ready to claim |
120
99
121
100
## Upgrade Intel Gaudi SPI Firmware
122
-
Refer [Upgrade Intel Gaudi SPI Firmware](/gaudi/Gaudi-SPI-Firmware-Upgrade.md) to upgrade the SPI Firmware on Intel Gaudi.
101
+
Refer [Upgrade Intel Gaudi SPI Firmware](/gaudi/Gaudi-SPI-Firmware-Upgrade.md) to upgrade the SPI Firmware on Intel Gaudi.
0 commit comments