Merge pull request #83599 from xenolinux/hcp-virt-nvidia-gpus

xenolinux · web-flow · commit 55731e7a0cd6 · 2024-11-08T12:38:07.000+05:30
OSDOCS#12121:HCP KubeVirt Nvidia GPU support
diff --git a/hosted_control_planes/hcp-manage/hcp-manage-virt.adoc b/hosted_control_planes/hcp-manage/hcp-manage-virt.adoc
@@ -50,3 +50,7 @@ include::modules/hcp-virt-image-caching.adoc[leveloffset=+2]
 * xref:../../virt/virtual_machines/creating_vms_custom/virt-creating-vms-by-cloning-pvcs.adoc#smart-cloning_virt-creating-vms-by-cloning-pvcs[Cloning a data volume using smart-cloning]
 
 include::modules/hcp-virt-etcd-storage.adoc[leveloffset=+2]
+
+include::modules/hcp-virt-attach-nvidia-gpus.adoc[leveloffset=+1]
+
+include::modules/hcp-virt-attach-nvidia-gpus-np-api.adoc[leveloffset=+1]
diff --git a/modules/hcp-virt-attach-nvidia-gpus-np-api.adoc b/modules/hcp-virt-attach-nvidia-gpus-np-api.adoc
@@ -0,0 +1,100 @@
+// Module included in the following assemblies:
+//
+// * hosted_control_planes/hcp-manage/hcp-manage-virt.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="hcp-virt-attach-nvidia-gpus-np-api_{context}"]
+= Attaching NVIDIA GPU devices by using the NodePool resource
+
+You can attach one or more NVIDIA graphics processing unit (GPU) devices to node pools by configuring the `nodepool.spec.platform.kubevirt.hostDevices` field in the `NodePool` resource.
+
+:FeatureName: Attaching NVIDIA GPU devices to node pools
+include::snippets/technology-preview.adoc[]
+
+.Procedure
+
+* Attach one or more GPU devices to node pools:
+
+** To attach a single GPU device, configure the `NodePool` resource by using the following example configuration:
++
+[source,yaml]
+----
+apiVersion: hypershift.openshift.io/v1beta1
+kind: NodePool
+metadata:
+  name: <hosted_cluster_name> <1>
+  namespace: <hosted_cluster_namespace> <2>
+spec:
+  arch: amd64
+  clusterName: <hosted_cluster_name>
+  management:
+    autoRepair: false
+    upgradeType: Replace
+  nodeDrainTimeout: 0s
+  nodeVolumeDetachTimeout: 0s
+  platform:
+    kubevirt:
+      attachDefaultNetwork: true
+      compute:
+        cores: <cpu> <3>
+        memory: <memory> <4>
+      hostDevices: <5>
+      - count: <count> <6>
+        deviceName: <gpu_device_name> <7>
+      networkInterfaceMultiqueue: Enable
+      rootVolume:
+        persistent:
+          size: 32Gi
+        type: Persistent
+    type: KubeVirt
+  replicas: <worker_node_count> <8>
+----
+<1> Specify the name of your hosted cluster, for instance, `example`.
+<2> Specify the name of the hosted cluster namespace, for example, `clusters`.
+<3> Specify a value for CPU, for example, `2`.
+<4> Specify a value for memory, for example, `16Gi`.
+<5> The `hostDevices` field defines a list of different types of GPU devices that you can attach to node pools.
+<6> Specify the number of GPU devices you want to attach to each virtual machine (VM) in node pools. For example, if you attach 2 GPU devices to 3 node pool replicas, all 3 VMs in the node pool are attached to the 2 GPU devices. The default count is `1`.
+<7> Specify the GPU device name, for example,`nvidia-a100`.
+<8> Specify the worker count, for example, `3`.
+
+** To attach multiple GPU devices, configure the `NodePool` resource by using the following example configuration:
++
+[source,yaml]
+----
+apiVersion: hypershift.openshift.io/v1beta1
+kind: NodePool
+metadata:
+  name: <hosted_cluster_name>
+  namespace: <hosted_cluster_namespace>
+spec:
+  arch: amd64
+  clusterName: <hosted_cluster_name>
+  management:
+    autoRepair: false
+    upgradeType: Replace
+  nodeDrainTimeout: 0s
+  nodeVolumeDetachTimeout: 0s
+  platform:
+    kubevirt:
+      attachDefaultNetwork: true
+      compute:
+        cores: <cpu>
+        memory: <memory>
+      hostDevices:
+      - count: <count>
+        deviceName: <gpu_device_name>
+      - count: <count>
+        deviceName: <gpu_device_name>
+      - count: <count>
+        deviceName: <gpu_device_name>
+      - count: <count>
+        deviceName: <gpu_device_name>
+      networkInterfaceMultiqueue: Enable
+      rootVolume:
+        persistent:
+          size: 32Gi
+        type: Persistent
+    type: KubeVirt
+  replicas: <worker_node_count>
+----
diff --git a/modules/hcp-virt-attach-nvidia-gpus.adoc b/modules/hcp-virt-attach-nvidia-gpus.adoc
@@ -0,0 +1,44 @@
+// Module included in the following assemblies:
+//
+// * hosted_control_planes/hcp-manage/hcp-manage-virt.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="hcp-virt-attach-nvidia-gpus_{context}"]
+= Attaching NVIDIA GPU devices by using the hcp CLI
+
+You can attach one or more NVIDIA graphics processing unit (GPU) devices to node pools by using the `hcp` command-line interface (CLI) in a hosted cluster on {VirtProductName}.
+
+:FeatureName: Attaching NVIDIA GPU devices to node pools
+include::snippets/technology-preview.adoc[]
+
+.Prerequisites
+
+* You have exposed the NVIDIA GPU device as a resource on the node where the GPU device resides. For more information, see link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/openshift-virtualization.html[NVIDIA GPU Operator with {VirtProductName}].
+
+* You have exposed the NVIDIA GPU device as an link:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#extended-resources[extended resource] on the node to assign it to node pools.
+
+.Procedure
+
+* You can attach the GPU device to node pools during cluster creation by running the following command:
++
+[source,terminal]
+----
+$ hcp create cluster kubevirt \
+  --name <hosted_cluster_name> \// <1>
+  --node-pool-replicas <worker_node_count> \// <2>
+  --pull-secret <path_to_pull_secret> \// <3>
+  --memory <memory> \// <4>
+  --cores <cpu> \// <5>
+  --host-device-name="<gpu_device_name>,count:<value>" <6>
+----
+<1> Specify the name of your hosted cluster, for instance, `example`.
+<2> Specify the worker count, for example, `3`.
+<3> Specify the path to your pull secret, for example, `/user/name/pullsecret`.
+<4> Specify a value for memory, for example, `16Gi`.
+<5> Specify a value for CPU, for example, `2`.
+<6> Specify the GPU device name and the count, for example, `--host-device-name="nvidia-a100,count:2"`. The `--host-device-name` argument takes the name of the GPU device from the infrastructure node and an optional count that represents the number of GPU devices you want to attach to each virtual machine (VM) in node pools. The default count is `1`. For example, if you attach 2 GPU devices to 3 node pool replicas, all 3 VMs in the node pool are attached to the 2 GPU devices.
++
+[TIP]
+====
+You can use the `--host-device-name` argument multiple times to attach multiple devices of different types.
+====